Thomas Wies (Ed.) 


© 
Ov 
Ov 
mM 
= 
N 
U 
= 
— 


32nd European Symposium on Programming, ESOP 2023 
Held as Part of the European Joint Conferences 

on Theory and Practice of Software, ETAPS 2023 

Paris, France, April 22-27, 2023 

Proceedings 


2) Springer OPEN ACCESS 


Lecture Notes in Computer Science 13990 


Founding Editors 


Gerhard Goos, Germany 
Juris Hartmanis, USA 


Editorial Board Members 


Elisa Bertino, USA Bernhard Steffen®, Germany 
Wen Gao, China Moti Yung@, USA 


Advanced Research in Computing and Software Science 


Subline of Lecture Notes in Computer Science 


Subline Series Editors 


Giorgio Ausiello, University of Rome ‘La Sapienza’, Italy 
Vladimiro Sassone, University of Southampton, UK 


Subline Advisory Board 


Susanne Albers, TU Munich, Germany 

Benjamin C. Pierce, University of Pennsylvania, USA 
Bernhard Steffen®, University of Dortmund, Germany 

Deng Xiaotie, Peking University, Beijing, China 

Jeannette M. Wing, Microsoft Research, Redmond, WA, USA 


More information about this series at https://link.springer.com/bookseries/558 


Thomas Wies 
Editor 


Programming 
Languages 
and Systems 


32nd European Symposium on Programming, ESOP 2023 
Held as Part of the European Joint Conferences 

on Theory and Practice of Software, ETAPS 2023 

Paris, France, April 22-27, 2023 

Proceedings 


GÀ Springer 


Editor 
Thomas Wies® 


New York University 
New York, NY, USA 


ISSN 0302-9743 ISSN 1611-3349 (electronic) 
Lecture Notes in Computer Science 
ISBN 978-3-031-30043-1 ISBN 978-3-031-30044-8 (eBook) 


https://doi.org/10.1007/978-3-03 1-30044-8 


© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication. 

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International 
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution 
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and 
the source, provide a link to the Creative Commons license and indicate if changes were made. 

The images or other third party material in this book are included in the book's Creative Commons license, 
unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative 
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, 
you will need to obtain permission directly from the copyright holder. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication 
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and regulations and therefore free for general use. 

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are 
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors 
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or 
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations. 


This Springer imprint is published by the registered company Springer Nature Switzerland AG 
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland 


ETAPS Foreword 


Welcome to the 26th ETAPS! ETAPS 2023 took place in Paris, the beautiful capital of 
France. ETAPS 2023 was the 26th instance of the European Joint Conferences on 
Theory and Practice of Software. ETAPS is an annual federated conference established 
in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each 
conference has its own Program Committee (PC) and its own Steering Committee 
(SC). The conferences cover various aspects of software systems, ranging from theo- 
retical computer science to foundations of programming languages, analysis tools, and 
formal approaches to software engineering. Organising these conferences in a coherent, 
highly synchronized conference programme enables researchers to participate in an 
exciting event, having the possibility to meet many colleagues working in different 
directions in the field, and to easily attend talks of different conferences. On the 
weekend before the main conference, numerous satellite workshops took place that 
attracted many researchers from all over the globe. 

ETAPS 2023 received 361 submissions in total, 124 of which were accepted, 
yielding an overall acceptance rate of 34.3%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2023 featured the unifying invited speakers Véronique Cortier (CNRS, 
LORIA laboratory, France) and Thomas A. Henzinger (Institute of Science and 
Technology, Austria) and the conference-specific invited speakers Mooly Sagiv (Tel 
Aviv University, Israel) for ESOP and Sven Apel (Saarland University, Germany) for 
FASE. Invited tutorials were provided by Ana-Lucia Varbanescu (University of 
Twente and University of Amsterdam, The Netherlands) on heterogeneous computing 
and Joost-Pieter Katoen (RWTH Aachen, Germany and University of Twente, The 
Netherlands) on probabilistic programming. 

As part of the programme we had the second edition of TOOLympics, an event to 
celebrate the achievements of the various competitions or comparative evaluations in 
the field of ETAPS. 

ETAPS 2023 was organized jointly by Sorbonne Université and Université 
Sorbonne Paris Nord. Sorbonne Université (SU) is a _ multidisciplinary, 
research-intensive and worldclass academic institution. It was created in 2018 as the 
merge of two first-class research-intensive universities, UPMC (Université Pierre and 
Marie Curie) and Paris-Sorbonne. SU has three faculties: humanities, medicine, and 
55,600 students (4,700 PhD students; 10,200 international students), 6,400 teachers, 
professor-researchers and 3,600 administrative and technical staff members. Université 
Sorbonne Paris Nord is one of the thirteen universities that succeeded the University of 
Paris in 1968. It is a major teaching and research center located in the north of Paris. It 
has five campuses, spread over the two departments of Seine-Saint-Denis and Val 
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d’Oise: Villetaneuse, Bobigny, Saint-Denis, the Plaine Saint-Denis and Argenteuil. The 
university has more than 25,000 students in different fields, such as health, medicine, 
languages, humanities, and science. The local organization team consisted of Fabrice 
Kordon (general co-chair), Laure Petrucci (general co-chair), Benedikt Bollig (work- 
shops), Stefan Haar (workshops), Etienne André (proceedings and tutorials), Céline 
Ghibaudo (sponsoring), Denis Poitrenaud (web), Stefan Schwoon (web), Benoit Barbot 
(publicity), Nathalie Sznajder (publicity), Anne-Marie Reytier (communication), 
Hélène Pétridis (finance) and Véronique Criart (finance). 

ETAPS 2023 is further supported by the following associations and societies: 
ETAPS e.V., EATCS (European Association for Theoretical Computer Science), 
EAPLS (European Association for Programming Languages and Systems), EASST 
(European Association of Software Science and Technology), Lip6 (Laboratoire 
d'Informatique de Paris 6), LIPN (Laboratoire d'informatique de Paris Nord), Sorbonne 
Université, Université Sorbonne Paris Nord, CNRS (Centre national de la recherche 
scientifique), CEA (Commissariat a l'énergie atomique et aux énergies alternatives), 
LMF (Laboratoire méthodes formelles), and Inria (Institut national de recherche en 
informatique et en automatique). 

The ETAPS Steering Committee consists of an Executive Board, and representa- 
tives of the individual ETAPS conferences, as well as representatives of EATCS, 
EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saar- 
briicken), Marieke Huisman (Twente, chair), Jan Kofron (Prague), Barbara König 
(Duisburg), Thomas Noll (Aachen), Caterina Urban (Inria), Jan Křetínský (Munich), 
and Lenore Zuck (Chicago). 

Other members of the steering committee are: Dirk Beyer (Munich), Luis Caires 
(Lisboa), Ana Cavalcanti (York), Bernd Finkbeiner (Saarland), Reiko Heckel 
(Leicester), Joost-Pieter Katoen (Aachen and Twente), Naoki Kobayashi (Tokyo), 
Fabrice Kordon (Paris), Laura Kovacs (Vienna), Orna Kupferman (Jerusalem), Leen 
Lambers (Cottbus), Tiziana Margaria (Limerick), Andrzej Murawski (Oxford), Laure 
Petrucci (Paris), Elizabeth Polgreen (Edinburgh), Peter Ryan (Luxembourg), Sriram 
Sankaranarayanan (Boulder), Don Sannella (Edinburgh), Natasha Sharygina (Lugano), 
Pawel Sobocinski (Tallinn), Sebastian Uchitel (London and Buenos Aires), Andrzej 
Wasowski (Copenhagen), Stephanie Weirich (Pennsylvania), Thomas Wies (New 
York), Anton Wijs (Eindhoven), and James Worrell (Oxford). 

I would like to take this opportunity to thank all authors, keynote speakers, atten- 
dees, organizers of the satellite workshops, and Springer-Verlag GmbH for their 
support. I hope you all enjoyed ETAPS 2023. 

Finally, a big thanks to Laure and Fabrice and their local organization team for all 
their enormous efforts to make ETAPS a fantastic event. 


April 2023 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


This volume contains the papers accepted at the 32nd European Symposium on Pro- 
gramming (ESOP 2023), held during April 22—27, 2023, in Paris, France. ESOP is one 
of the European Joint Conferences on Theory and Practice of Software (ETAPS); it is 
dedicated to fundamental issues in the specification, design, analysis, and implemen- 
tation of programming languages and systems. 

The 20 papers in this volume were selected from 55 submissions based on their 
originality and quality. One submission was desk rejected due to formatting issues. 
Each of the remaining submissions received at least three reviews. Authors were given 
the opportunity to respond to the initial reviews of their papers during the rebuttal 
period, December 6-8, 2022. Afterwards, the papers were discussed by the 30 Program 
Committee (PC) members and the 37 external reviewers. ESOP 2023 followed a 
double-blind review process. Roland Meyer kindly handled the two papers for which 
the PC Chair had conflicts of interest. 

ESOP 2023 continued the artifact evaluation process established by ESOP 2022. For 
this edition, the evaluation was conducted by a joint Artifact Evaluation Committee 
(AEC) with FoSSaCS 2023. Authors of accepted papers were invited to submit arti- 
facts, such as code, datasets, and mechanized proofs that supported the conclusions 
of their papers. The AEC members read the papers and explored the artifacts, assessing 
their quality and checking that they supported the authors’ claims. The authors of seven 
of the accepted papers submitted artifacts, which were evaluated by 21 AEC members, 
with each artifact receiving at least three reviews. Authors of papers with accepted 
artifacts were assigned official EAPLS artifact evaluation badges, indicating that they 
have taken the extra time and have undergone the extra scrutiny to prepare a useful 
artifact. The ESOP 2023 AEC awarded Artifact Functional, Artifact (Functional and) 
Reusable, and Artifact Available badges. All submitted artifacts were deemed Func- 
tional and Available, and all but two were also found to be Reusable. 

I sincerely thank everyone who contributed to the success of the conference. 
Foremost, my deep gratitude goes to the authors who submitted their works for review, 
providing the basis for an exciting conference program. I would like to thank the 
members of the ESOP 2023 Program Committee for their detailed and constructive 
reviews, and for their active participation in the online discussions. The external 
reviewers provided additional expertise that was often crucial to arrive at an informed 
decision. For this, they have my deepest gratitude. I also thank Niccolo Veltri and 
Sebastian Wolff for serving as co-chairs of the joint ESOP/FoSSaCS 2023 Artifact 
Evaluation Committee. It was an honor to work with all of you! Finally, I would like to 
thank all who contributed to the organization of ESOP 2023: the ESOP steering 
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committee and its chairs Luis Caires and Peter Thiemann, as well as the ETAPS 
steering committee and its chair Marieke Huisman, who often provided helpful guid- 


ance and feedback. 
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Logics for Extensional, Locally Complete 
Analysis via Domain Refinements * 


Flavio Ascari®) @, Roberto Bruni®, and Roberta Gori® 


Dipartimento di Informatica, Universita di Pisa, Largo B. Pontecorvo 3, Pisa, Italy, 
flavio.ascari@phd.unipi.it, {roberto.bruni,roberta.gori}@unipi.it 


Abstract. Abstract interpretation is a framework to design sound static 
analyses by over-approximating the set of program behaviours. While 
over-approximations can prove correctness, they cannot witness incor- 
rectness because false alarms may arise. An ideal, but uncommon, situ- 
ation is completeness of the abstraction that can ensure no false alarm 
is introduced by the abstract interpreter. Local Completeness Logic is a 
proof system that can decide both correctness and incorrectness of a pro- 
gram: any provable triple + 4 [P] c [Q] in the logic implies completeness 
of an intensional abstraction of program c on input P and is such that 
Q can be used to decide (in)correctness. However, completeness itself is 
an extensional property of the function computed by the program, while 
the above intensional analysis depends on the way the program is written 
and therefore not all valid triples can be derived in the proof system. Our 
main contribution is the study of new inference rules which allow one to 
perform part of the intensional analysis in a more precise abstract do- 
main, and then to transfer the result back to the coarser domain. With 
these new rules, all (extensionally) valid triples can be derived in the 
proof system, thus untying the set of provable properties from the way 
the program is written. 


Keywords: Abstract interpretation, Completeness in abstract interpre- 
tation, Hoare logic, Abstract domain refinement, Extensionality 


1 Introduction 


Static program analysis has been widely used to help developers produce valid 
software. Among static analysis techniques, abstract interpretation [6,7] is a 
general formalism to define sound-by-construction over-approximations that has 
been successfully applied in many fields, such as model checking, security and 
optimization [8]. Static analyses are often defined as over-approximations, that 
is the analysis computes a superset of the behaviors. This leads to no false 
negatives, that is all issues of the software are identified by the analysis, but it 
can cause false alarms: an incorrect behavior may be an artifact of the analysis, 
added by the over-approximation. While the absence of false negatives allowed 
a wide applicability of abstract interpretation techniques, it also make tools less 


* Research supported by MIUR PRIN Project 201784YSZ5 ASPRA-Analysis of Pro- 
gram Analyses. 
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reliable to identify bugs. In fact, in many industrial applications any false alarm 
reported by the analysis to the developers diminishes its credibility, making it 
less effective in practice. This argument has recently led to the development of 
a logic of under-approximations, called incorrectness logic [16,17]. 


The Problem. In abstract interpretation, an ideal situation is completeness. 
Given an expressible specification, that is, one represented exactly in the abstract 
domain, a complete abstraction reports no false alarms. In its most widespread 
formulation [7], completeness is a global property: a program c is complete in the 
abstraction A if a condition holds for all possible inputs. Let C be the concrete 
domain and |c] : C > C be the (collecting) denotational semantics of c. Given 
an abstract domain A, a concretization function y : A —> C and an abstrac- 
tion function a : C — A, an abstract interpreter [c]’, : A > A is complete 
in A if for all possible inputs P we have [c]*,a(P) = a({[c]P). Unfortunately, 
because of universal quantification over the possible inputs, this condition is dif- 
ficult to meet in practice. Moreover, in most cases completeness is checked on 
an intensional abstraction of |c] computed inductively on the syntax, through 
inductive reasoning by an abstract interpreter Ic’, making completeness an in- 
tensional property dependent on the program syntax [10]. However, in principle 
completeness is an extensional property, that only depends on the best correct 
abstraction [c]4 of [c] in A, defined by [c]4 £ alc]. We sum up what we may 
call intensional (on the left) and extensional (on the right) completeness in the 
following equations: 


[lie = alel [c]*a = afc]ya = afe] (1) 
We show the difference between [[c]* and cl’, in the following example. 


Example 1 (Extensional and intensional properties). Consider the concrete do- 
main of sets of integers and the abstract domain of signs: 
Sign Z 
ZIN 
Z<o Zo Z>0 


IX XI 
Z<o Z=0 Z>0 


NIZ 
Ø 


The meaning of the abstract elements of Sign is to represent concrete values 
that satisfy the respective property. So for instance, denoting with the function 
y the “meaning” of an abstract element, we have 7(Z<o) = {n € Z|n < 0}. 
Conversely, œ “abstracts” a concrete set of values to the least abstract property 
describing it, for instance a({0; 1; 100}) = Z>o. 

Consider the simple program fragment c Ê x := x + 1; x := x - 1. Its 
denotational semantics [c] is the identity function idz, so its best correct ab- 
straction is the abstract identity idsign = a idz y. This is an extensional prop- 
erty of the program because it only depends on the function it computes, i.e., its 
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denotational semantics. However, an analyzer does not know the semantics of c, 
so it has to analyze the program syntactically, breaking it down in elementary 
pieces and gluing the results together. So for instance, starting from the concrete 
point P = {1} the analysis first abstracts it to the property a(P) = Zso, then 
it computes 


[Vgien(Z>0) = [x := x - tNbgnl = x + D8 gn(Z>0) 
= [x i= x ~ MS gq(Z>0) = Zeo. 
Analogous calculations for all properties in Sign yields the abstraction 
alt ifa=L 
Z>o ifae { Ze9,|Zso ,Z>0} 


H = 
[<] Sign (@) Ze if TS Zo 


T if a € { Z<o , Z0, T} 


that, albeit sound, is less precise than idsign (we highlight with a gray background 
all inputs on which [c]; gn loses accuracy). If instead the program were written as 
c’ = skip, the analysis in Sign would yield the best correct abstraction [e Deen = 
idsign. Therefore, the abstraction depends on how the program is written and not 
only on its semantics: it is what it is called an intensional property (see e.g. [1 
for more about intensional and extensional abstract properties). 


To overcome the former limitation of “global” completeness, the concept of 
local completeness [2] has been recently proposed that is related to some specific 
input. While this condition is much more common in practice, it is also much 
more complex to prove. In order to do so, the authors of [2] introduce a Local 
Completeness Logic parametric with respect to an abstraction A (LCL, for 
short), that is able to prove triples H4 [P] c [Q] with the following meaning 


1. Q is an under-approximation of the concrete semantics [c]P, 
2. Q and [c] P have the same over-approximation in A, 
3. A is locally complete for the intensional abstraction [cl’, on input P. 


The important consequence of the previous points is the fact that a triple in 
LCL, is able to prove both correctness and incorrectness of a program with 
respect to a specification Spec expressible in A. By point (2), if the abstract 
analysis reports no errors in Q then there are none because of the over-approxi- 
mation. However, if the analysis does report an issue, this must be present in the 
abstraction of [cl] P as well, that is the same as the abstraction of Q: this means 
that Q contains a witness of the violation of Spec, and this witness must be in 
[c] P because of the under-approximation ensured by point (1). While local com- 
pleteness of point (3) is a key property to prove point (1-2), it would be enough 
to guarantee that (3) holds for the extensional best correct approximation [c]4 
of |c] rather than for the intensional abstract interpreter [cl*: this suggests that 
it is possible to weaken the hypothesis (3) in order to make the proof system 
able to derive more valid triples. 
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Main Contributions. Building on the proof system of LCL, we add new rules 
to relax point (3) to local completeness of the extensional abstraction [c]4. This 
way, while the proof system itself remains intensional as it deduces program 
properties by working inductively on the syntax, the information it produces is 
more precise. Specifically, since the property associated with triples is extensional 
no precision is lost because of the intensional abstract interpreter, and in the 
end allows us to prove more triples. In order to achieve this goal, we introduce 
new rules to dynamically refine the abstract domain during the analysis. While 
in general an analysis in a more concrete domain is more precise, LCL 4 requires 
local completeness, which is not necessarily preserved by domain refinement [11]. 
For instance, a common way to combine two different abstract domains is their 
reduced product [7], but it is not always the case that the analysis in the reduced 
product is (locally) complete, even when it is such in the two domains. 

To preserve local completeness, we introduce several rules for domain re- 
finement in LCL, and compare their expressiveness and usability. All of them 
provide extensional guarantees, in the sense that point (3) is replaced with local 
completeness of the best correct abstraction |c]4 on input P. The first one is 
called (refine-ext). LCL 4 extended with (refine-ext) turns out to be logically com- 
plete: any triple satisfying the above conditions (1-3) can be proved in our proof 
system. This is a theoretical improvement with respect to LCLy, that instead 
was intrinsically incomplete as a logic, i.e., for all abstractions A there exists a 
sound triple that cannot be proved. While (refine-ext) is theoretically interesting, 
one of its hypothesis is unfeasible to check in practice. To improve applicability, 
we propose two derived rules, (refine-int) and (refine-pre), whose premises can 
be checked effectively and imply the hypotheses of the more general (refine-ext). 
Surprisingly, it turns out that (refine-int) enjoys a logical completeness result 
too, while (refine-pre) is strictly weaker (in terms of strength of the logic, see 
Example 6). Despite this, the latter is much simpler and preferable to use in 
practice whenever possible (see Example 5), while the former can be used in 
more situations and is at times the best choice. 

We present a pictorial comparison among the expressiveness of the various 
proof systems in Fig. 1. Each node represent the proof system LCL 4 extended 
with one rule (the bottom one being plain LCL,). An arrow in the picture 
means a more powerful proof system, i.e., a proof system that can prove more 
triples, with its label pointing out the result justifying the claim. The two arrows 
between the two topmost nodes are because the two proof systems are logically 
equivalent, i.e., they can prove the same triples. 


Structure of the paper. In Section 2 we explain the notation used in the paper 
and recall the basics of abstract interpretation. In Section 3 we present LCL 4, 
mostly summarizing the content of [2], with a focus on what is used in the 
following sections. In Section 4 we present and compare our new rules to refine 
the abstract domain, namely (refine-ext) and the two derived rules (refine-int) 
and (refine-pre). We conclude in Section 5. Some proofs and technical examples 
are in Appendix A. 
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LCL, + 
(refine-ext) 


LCL, + 
(refine-int) 


LCL, + 
(refine-pre) 


Fig. 1: Relations between the new proof systems 


2 Background 


Notation. We write P(S) for the powerset of S and ids : S — S for the identity 
function on a set S, with subscripts omitted when obvious from the context. If 
f: S — T is a function, we overload the symbol f to denote also its lifting 
f :P(S) > P(T) defined as f(X) = {f(x)|x € X} for any X C S. Given two 
functions f : S —> T and g : T — V we denote their composition as g o f or 
simply gf. For a function f : S —> S, we denote f” : S + S the composition of 
f with itself n times, i.e. f? = ids and f?t! = f o f”. 

In ordered structures, such as posets and lattices, with carrier set C, we 
denote the ordering with <c, least upper bounds (lubs) with Uc, greatest lower 
bounds (glbs) with Nc, least element with Lc and greatest element with Tc. For 
all these, we omit the subscript when evident from the context. Any powerset is 
a complete lattice ordered by set inclusion. In this case, we use standard symbols 
C, U, ete. Given a poset T and two functions f,g: S — T, the notation f < g 
means that, for all s € S, f(s) <r g(s). A function f between complete lattices 
is additive (resp. co-additive) if it preserves arbitrary lubs (resp. glbs). 


2.1 Abstract Interpretation 


Abstract interpretation [6,7,5] is a general framework to define static analyses 
that are sound by construction. The main idea is to approximate the program 
semantics on some abstract domain A instead of working on the concrete domain 
C. The main tool used to study abstract interpretations are Galois connections. 
Given two complete lattices C and A, a pair of monotone functions a: C > A 
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and y: A —> C define a Galois connection (GC) when 
Vee Cae A. al(c)<aa = c<o qla). 


We call C and A the concrete and the abstract domain respectively, a the ab- 
straction function and y the concretization function. The functions a and y are 
also called adjoints. For any GC, it holds idc < ya, ay < ida, y is co-additive 
and a is additive. A concrete value c € C is called expressible in A if ya(c) = c. 
We only consider GCs in which ay = ida, called Galois insertions (GIs). In a 
GI a is onto and y is injective. A GI is said to be trivial if A is isomorphic to 
the concrete domain or if it is the singleton {T 4}. 

We overload the symbol A to denote also the function ya : C — C: this 
is always a closure operator, that is a monotone, increasing (i.e. c < A(c) for 
all c) and idempotent function. In the following, we use closure operators as 
much as possible to simplify the notation. Particularly, they are useful to denote 
domain refinements, as exemplified in the next paragraph. Note that they are 
still very expressive because y is injective: for instance A(c) = A(c’) if and only 
if a(c) = a(c’). Nonetheless, the use of closure operators is only a matter of 
notation and it is always possible to rewrite them using the adjoints. 

We use Abs(C) to denote the set of abstract domains over C, and we write 
Aay E Abs(C) when we need to make the two maps a and y explicit (we omit 
them when not needed). Given two abstract domains Aa,y, Aj, E Abs(C) 
over C, we say A’ is a refinement of A, written A’ < A, when y(A) C 7/(A’). 
When this happens, the abstract domain A’ is more expressive than A, and in 
particular for all concrete elements c € C the inequality A’(c) <c A(c) holds. 


Abstracting Functions. Given a monotone function f : C > C and an abstract 
domain Aa, € Abs(C), a function fë : A > A is a sound approximation (or 
abstraction) of f if af < fa. Its best correct approximation (bea) is f4 = afy, 
and it is the most precise of all the sound approximations of f: a function f* is 
a sound approximation of f if and only if f4 < fË. 

A sound abstraction f? of f is complete if af = fta. It turns out that there 
exists a complete abstraction f* if and only if the bca f4 is complete. If this 
is the case, we say that the abstract domain A is complete for f and denote 
it with C4(f). Intuitively, completeness means that the abstract function f* is 
as precise as possible in the given abstract domain A, and in program analysis 
this allows to have greater confidence in the alarms raised. We remark that A 
is complete for f if and only if af = ffa = afya. Since y is injective, this is 
true if and only if yaf = yafya, so that we define the (global) completeness 
property C4(f) as follows: 


CA(f) — Af = AfA. 


2.2 Regular Commands. 
Following [2] (see also [16]) we consider a language of regular commands: 


Reg Ə r = e|rr|rer]| č 


Logics for Extensional, Locally Complete Analysis via Domain Refinements 7 


This is a general language and can be instantiated differently changing the set 
Exp of basic transfer expressions e. These determines the kind of operations 
allowed in the language, and in our examples we assume to have deterministic 
assignments and boolean guards. Using standard definitions for arithmetic and 
boolean expressions a € AExp and b € BExp, we consider 


Exp > e ::= skip|x := a|b? 


skip does nothing, x := a is a standard deterministic assignment. The seman- 
tics of b? is that of an “assume” statement: if its input satisfies b it does nothing, 
otherwise it diverges. The term r;r represent the usual sequential composition, 
and r@r is nondeterministic choice. The Kleene star r* denote a nondeterministic 
iteration, where r can be executed any number of time (possibly 0) before exiting. 
It can be thought as the solution of the recursive equation r* = skip@(r;r*). We 
write r” to denote sequential composition of r with itself n times, analogously to 
how we use f” for function composition. 

This formulation can accommodate for a standard imperative programming 
language [18] defining if and while statements as 


if (b) then cı else co  (b?; c,)@((—b)?; c2) 
while (b) do c = (b?; c)*;(—b)? 


Concrete semantics. We assume the semantics (:) : Exp + C — C of basic 
transfer expressions on a complete lattice C to be additive. We believe this 
assumption not to be restrictive, and is always satisfied in collecting semantics. 
For our instantiation of Exp, we consider a finite set of variables Var, then the 
set of stores X = Var > Z that are (total) functions ø from Var to integers. The 
complete lattice C is then defined simply as P(X) with the usual poset structure 
given by set inclusion. Given a store ø € X, store update o[x +> v] is defined 
as usual for x € Var and v € Z. We consider standard, inductively defined 
semantics (-|) for arithmetic and boolean expressions. The concrete semantics of 
regular commands |-] : Reg > C — C is defined inductively as in Fig. 2a, where 
the semantics of basic transfer expressions e € Exp is defined as follows: 


(skip)S = S 
(x := a)S 4 {oz (a)o]|o € S} 
(b7)S = {0o € S| (b)o = tt} 


Abstract Semantics. The (compositional) abstract semantics of regular com- 
mands BIA : Reg — A — A on an abstract domain A € Abs(C) is defined 
inductively as in Fig. 2b. As common for abstract interpreters, we assume the 
analyser knows the best correct abstraction of expression and thus is able to 
compute [e]*. A straightforward proof by structural induction shows that the 
abstract semantics is sound w.r.t. [r] (i-e., afr] < [r]*,0) and monotone. How- 
ever, in general it is less precise than the bca, i.e., DA Z [r]4 = afry. 
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[ele = (e)e [ejha £ [ela = alle)y(a) 
[ris role = [re] [ri] (©) [ras relha $ [re] nD (a) 
[n @ rae £ [neue [reJe [n @ re]i,a £ [rı]h aua fro] a 
tJe ||] has | |da 
n>0 n>0 
(a) Concrete semantics (b) Abstract semantics 


Fig. 2: Concrete and abstract semantics of regular commands, side by side 


Shorthands. Throughout the paper, we present some simple examples of pro- 
gram analysis. The programs discussed in the examples contain just one or two 
variables (usually x and y), so we denote their sets of stores just as X = Z or 
X = Z?. In these cases, the convention is that an element of Z is the value of 
the single variable in Var, and a pair (n,m) € Z? denote the store o(x) = n, 
a(y) = m. We also lift these conventions to sets of values in Z or Z?. At times, 
to improve readability, we use logical formulas such as (y € {1;2;99} Ax = y) 
possibly using intervals, like in x € [0; 5], to describe set of stores. 


3 Local Completeness Logic 


In this section we present the notion of local completeness and introduce the 
proof system LCL, (Local Completeness Logic on A) as was defined in [2]. 

For a generic program and abstract domain, global completeness is a too 
strong requirement: for conditionals to be complete the abstract domain should 
basically contain a complete sublattice of the concrete domain. For this reason, 
the weaker notion of local completeness can be more convenient in many cases. 


Definition 1 (Local completeness, cf. [2]). Let f : C — C be a concrete 
function, c € C a concrete point and A E€ Abs(C) and abstract domain for C. 
Then A is locally complete for f on c, written CA(f), iff 


Af (c) = AfA(c). 


A remarkable difference between global and local completeness is that, while the 
former can be proved compositionally irrespective of the input [10], the latter 
needs it. Consequently, to carry on a compositional proof of local completeness, 
information on the input to each subpart of the program is also required, i.e., all 
traversed states are important. However, local completeness enjoys an “abstract 
convexity” property, that is, local completeness on a concrete point c implies 
local completeness on any concrete point d between c and its abstraction A(c). 
This observation has been exploited in the design of the proof system LCL. 
The system is able to prove triples H4 [P] r [Q] ensuring that: 
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CAD _ (yangfer) PSPSAP) balPlri@l QSR SAO ela 
Fa [P] e [Le] P] Fa [P] r [Q] 
FA [P] ry [R] Fa [R] ro Q] (seq) Fa [P] rı [Qi] Fa [P] ro [Q2] (join) 
Fa [P] r; r2 [Q] ba [P] r @ re [Qi V Qo] 
Ea [P] r [R] FA [PVR] t [A ey APII QS AP) terate) 
Fa [P] r* [Q] Fa [P] ” [P VQ] 


Fig. 3: The proof system LCL 4. 


1. Q is an under-approximation of the concrete semantics [r]P, 
2. Q and [r]P have the same over-approximation in A, 
3. A is locally complete for [r] on input P. 


The second point means that, given a specification Spec expressible in A, any 
provable triple F4 [P] r [Q] either proves correctness of r with respect to Spec or 
expose some alerts in Q \ Spec. These in turns correspond to true ones because 
of the first point, as spelled out by Corollary 1 below. 

The proof system is defined in Fig. 3. The crux of the proof system is to con- 
strain the under-approximation Q to have the same abstraction of the concrete 
semantics [r].P, as for instance explicitly required in rule (relax). This, by the 
abstract convexity property mentioned above, means that local completeness of 
[r] on the under-approximation P of the concrete store is enough to prove local 
completeness. 

The three key properties (1-3) listed above are formalized by the following 
(intensional) soundness result: 


Theorem 1 (Soundness, cf. [2]). Let Aay E Abs(C). If 4 [P] r [Q] then: 


1. Q< [r]P, 
2. a([r]P) = a(Q), 
3. [r]ia(P) = aQ). 


As a consequence of this theorem, given a specification expressible in the abstract 
domain A, a provable triple F4 [P] r [Q] can determine both correctness and 
incorrectness of the program r: 


Corollary 1 (Proofs of Verification, cf. [2]). Let Ag, E€ Abs(C) anda € A. 
If Ha [P] r [Q] then 

IP < ya) = Q < yla). 
The corollary is useful in program analysis and verification because, given a 


specification a expressible in A and a provable triple F4 [P] r [Q], it allows to 
distinguish two cases. 


— If Q C x(a), then we have also [r] P C y(a), so that the program is correct 
with respect to the specification. 
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— If Q É y(a), then also [r]P É 7(a), that means [r] P \ (a) is not empty and 
thus contains a true alert of the program. Moreover, since Q C [r] P we have 
that Q \ yla) C [r] P \ yla), so that already Q pinpoints some issues. 


To better show how this work, we briefly introduce the following example (dis- 
cussed also in [2] where it is possible to find all details of the derivation). 


Example 2. Consider the concrete domain C = P(Z), the abstract domain Int of 
intervals, the precondition P = {1;999} and the command r = (r; ® r2)*, where 


(x > 0)?; x :=x-1 


rı 
ro = (x < 1000)?; x :=x +1 


In LCL, it is possible to prove the triple Fint [P] r [Q], whose postcondition 
is Q = {0;2; 1000}. Consider the two specification Spec = (x < 1000) and 
Spec’ = (x > 100). The triple is then able to prove correctness of Spec and 
incorrectness of Spec’. For the former, observe that Q C Spec. By Corollary 1 
we then know [|r] P C Spec, that is correctness. For the latter, Q exhibits two 
witnesses to the violation of Spec’, that are 0,2 € Q \ Spec’. By point (1) of 
soundness we then know that 0,2 € Q C fr] P are true alerts. 


Strictly speaking, the proof of Corollary 1 only relies on points (1-2) of The- 
orem 1. Point (3) is in turn needed to ensure the first two, but extensional 
completeness would suffice to this aim. This means that we can weaken the 
soundness theorem (logically speaking, that is we prove a stronger conclusion, 
so the theorem as an implication is weaker) while still preserving the validity 
of Corollary 1. To this end, we propose a new soundness result involving exten- 
sional completeness: the important difference is that in point (3) we use the best 
correct abstraction [r]4 in place of the inductively defined [Jf Since Theorem 1 
involves Irl}, an intensional property of the program r that depends on how the 
program is written (see Example 1 or Example 1 in Section 5 of [13]), while the 
new statement we propose relies only on [r]4, an extensional property of the 
computed function |r] and not of r itself, for the rest of the paper we use the 
name intensional soundness for Theorem 1, and extensional soundness for the 
following Theorem 2. 


Theorem 2 (Extensional soundness). Let Ay, E Abs(C). If 4 [P] r [Q] 
then: 


1. Q < (AP, 
2. a([r]P) = a(Q), 
3. [r]4a(P) = a(Q). 


Lastly, we remark that the original LCL, is intrinsically logically incomplete 
([2], cf. Theorem 5.12): for every non trivial abstraction A there exists a triple 
that is intensionally sound (satisfies points (1-3) of Theorem 1) but cannot be 
proved in LCL 4. We will discuss logical (in)completeness for our extensional 
framework in Section 4.1. 
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ba [P] r[Q] AA Alr]*’ A(P) = AQ) 
Fa [P] r [Q] 


(refine-ext) 


Fig. 4: Rule refine for LCL 4. 


4 Refining Abstract Domain 


LCL, can prove a triple [P] r [Q] for some Q only when BA is locally com- 
plete, that is [rji a(P) = a([r]P) (see Theorem 1). Since BIA is computed in 
a compositional way, the above condition strictly depends on how r is written: 
to prove the local completeness of [T4., we need to prove that all its syntactic 
components are locally complete, that is an intensional property. However, the 
goal of the analysis is to study the behaviour of the function [r], not how it is 
encoded by r. Hence, our aim is to enhance the original proof system in order to 
be able to handle triples where the extensional abstraction [r]4 is proved to be 
locally complete w.r.t. the given input, that is [r]4a(P) = a([r]P). To this end, 
we extend the proof system with a new inference rule, that is shown in Fig. 4. It 
is named after “refine” because it allows to refine abstract domains A to some 
A’ < A and “ext” since it involves the extensional bea [r]4’ of [r] in A’ (to 
distinguish it from the rules we will introduce in Section 4.2). 

Using (refine-ext) it is possible to construct a derivation that proves local 
completeness of portions of the whole program in a more precise abstract domain 
A’ and then carries the result over to the global analysis in a coarser domain A. 
The only requirement for the application of the rule is that domain A’ is chosen 
in such a way that A[r]4’ A(P) = A(Q) is satisfied. 


Formally, given the two abstract domains Aa,y, A’, „ E Abs(C), this last 


premise of rule (refine-ext) should be written as a7y'[r]4 a’ A(P) = a(Q) to match 
function domains and codomains. However we prefer the more concise, albeit a 
little imprecise, notation used in Fig. 4. That writing is justified by the following 
intuitive argument: since A’ < A we can consider with a slight abuse of notation 
(seeing abstract domains as closures) A C A’ C C, so that for any element 
a E€ ACC we have 7(a) = 7/(a) = a and for any c € C we have a’ A(c) = A(c). 
With these, it follows that 


ay [r]J“ a A(P) = afr] A(P) = Afr)” A(P). 


E 
ay 


With rule (refine-ext) we cannot prove intensional soundness (Theorem 1): 
since this rule allows to perform part of the analysis in a more concrete domain 
A', we do not get any information on [rji However, we can prove extensional 
soundness (Theorem 2) and get all the benefits of Corollary 1. 


Theorem 3 (Extensional soundness of (refine-ext)). The proof system in 
Fig. 3 with the addition of rule (refine-ext) (see Fig. 4) is extensionally sound 
(cf. Theorem 2). 
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We also remark that a rule like (refine-ext), that allows to carry on part of the 
proof in a different abstract domain, cannot come unconstrained. We present an 
example showing that a similar inference rule only requiring the triple [P] r [Q] 
to be provable in an abstract domain A’ < A without any other constraint would 
be unsound. 


Example 3. Consider the concrete domain C = P(Z) of integers, the point P = 
{—5;—1}, the abstract domain Sign of Example 1 and the program 


rx :=x + 10. 


Then C < Sign and we can prove Fg [P] r [{5;9}] applying (transfer) since all 
assignments are locally complete in the concrete domain. However, if f = [r] = 
(x := x + 10), it is not the case that C22"(f): indeed 


Sign(f(Sign(P))) = Sign(f(Z<o)) = Sign({n € Z|n < 10}) = T 
while 
Sign(f(P)) = Sign({5, 9}) = Zso. 


This means that a rule without any additional condition can prove a triple which 
is not locally complete, hence it is unsound. 


4.1 Logical Completeness 


Among all the possible conditions that can be added to a rule like (refine-ext), 
we believe ours to be very general since, differently than the original LCL 4 proof 
system (see Section 5.2 of [2]), the introduction of (refine-ext) allows us to derive 
a logical completeness result, i.e. the ability to prove any triple satisfying the 
soundness properties guaranteed by the proof system. 

However, to prove such a result, our extension need an additional rule to 
handle loops, just like the original LCL 4 and Incorrectness Logic [16]. The nec- 
essary infinitary rule, called (limit), allows the proof system to handle Kleene 
star, and is the same as LCLy: 


Yn EN. Fa [Pa] r [Ph+] 
Fa [Po] r" [Vien Pi] 


(limit) 


Theorem 4 (Logical completeness of (refine-ext)). Consider the proof sys- 
tem of Fig. 3 with the addition of rules (refine-ext) and (limit). If Q < [r] P and 
[r]4a(P) = a(Q) then F4 [P] r [Q]. 


The previous theorem proves the logical completeness of our proof system with 
respect to the property of extensional soundness. Indeed, if Q < [r] P and 
[r]4a(P) = a(Q) we also have: 


a(Q) < al[r]P) < [r]^a(P) = a(Q), 
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hence all three conditions of Theorem 2 are satisfied. 

An interesting consequence of this result is the existence of a refinement A’ 
in which it is possible to carry out the proof. In principle such a refinement 
could be the concrete domain C (as shown in the proof in Appendix A), that 
is not computable. However, it is worth nothing that for a sequential fragment 
(a portion of code without loops) the concrete domain can be actually used 
(for instance via first-order logic). This opens up the possibility, for instance, to 
infer a loop invariant on the body using C, and then prove it using an abstract 
domain. In Section 4.3 we discuss this issue further. 


4.2 Derived Refinement Rules 


The hypothesis Afr] A(P) = A(Q) is added to rule (refine-ext) in order to 
guarantee soundness: in general, the ability to prove a triple such as [P] r [Q] ina 
refined domain A’ only gives information on Afr]^ A’ (P) but not on Afr] A(P). 
In fact, the Example 4 shows that Afr]^ A'(P) and A[r]* A(P) can be different. 


Example 4. Consider the concrete domain P(Z), the abstract domain of signs 
Sign, . € Abs(P(Z)) (introduced in Example 1) and its refinement Sign, below 


Sign Z Sign, Z 
ral rales 
Z<o Z4o0 Z>0 KX 
Z<o Z=0 Z>0 Zeo Z=o ka 
WW SL 
D 2) 
For the command r= x := x - 1 and the concrete point P = {1} we have 


Sign [r]*": Sign, (P) = Sign|[r]5*" (Z1) = Zo 


but 


Sign[r]°2": Sign(P) = Sign[r] 82 (Zs0) = Z>o. 


Despite being necessary, the hypothesis of rule (refine-ext) cannot be checked 
in practice because the bca [r]^ of a composite command r is not known by the 
analyser. To mitigate this issue, we present two derived rules whose premises 
imply the premises of Rule (refine-ext), hence ensuring extensional soundness by 
means of Theorem 3. 

The first rule we present replaces the requirement on the extensional bca [r] sa 
with requirements on the intensional compositional abstraction [rli computed 
in A’. For this reason, we call this rule (refine-int). 


Proposition 1. The following rule (refine-int) is extensionally sound: 
ka [Pl [Q] AA Alri AP) = AQ) 
Fa [P] r [Q] 


(refine-int) 
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It is worth noting that now the condition on the compositional abstraction [Jf 
can easily be checked by the analyser, possibly alongside the analysis of r with 
LCL or using a stand-alone abstract interpreter. Moreover, this rule is as pow- 
erful as the original (refine-ext) because it allows to prove a logical completeness 
result akin to Theorem 4. 


Theorem 5 (Logical completeness of (refine-int)). Consider the proof sys- 
tem of Fig. 3 with the addition of rules (refine-int) and (limit). If Q < [r] P and 
[r] a(P) = a(Q) then Fa [P] r [Q]. 


Just like logical completeness for (refine-ext), this result implies the existence of a 
refinement A’ in which it is possible to carry out the proof (possibly the concrete 
domain C). The discussion about how to find one is sketched in Section 4.3. 

The second derived rule we propose is simpler than (refine-ext), as it just 
checks the abstractions A(P) and A’(P), with no reference to the regular com- 
mand r nor to the postcondition Q. Since the premise is only on the precondition 
P, we call this rule (refine-pre). 


Proposition 2. The following rule (refine-pre) is extensionally sound: 
Fa [P]r [Q] A <A A'(P)=A(P) 
Fa [P] r [Q] 


(refine-pre) 


Rule (refine-pre) only requires a simple check at the application site instead of 
an expensive analysis of the program r, so it can be preferred in practice. 

We present an example to highlight the advantages of this rule (as well as 
(refine-int)), which allows us to use different domains in the proof derivation of 
different parts of the program. 


Example 5 (The use of (refine-pre)). Consider the two program fragments 


r (y != 0)?; y := abs(y) 


y; while (x > 1) {y :=y- 1; x :=x-1} 
=x := y; ((x > 1)?; y := y- 1; x :=x - 1)*; (x <= 1)? 


r2 


and the program r = rı;rə. Here abs is a function to compute the absolute 
value, and we assume, for the sake of simplicity, that the analyser knows its best 
abstraction. Consider the concrete domain P(Z?) where a pair (n,m) denote 
a state x = n, y = m, and the initial state P = (y € [—100;100]), a logical 
description of the concrete {(n, m) |m € [-100; 100]} € P(Z?). The bea [r]!"* in 
the abstract domain of intervals is locally complete on P (since P is expressible 
in Int), but the compositional abstraction [rli is not: 


[]""a(P) = Int([re] [a] ({(n, m) | m € [-100; 100]})) 
= Int([r2] ({(m, m) |m € [1; 100] })) 
= Int({(1, 1)}) 
= ([1;1] x [4 1), 
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cot y= abs (y)]]) 


transfer 
cio (Ly != 07]) Fintzo [Ri] y := abs(y) [y € [1; 100] ( ) 
(transfer) (relax) 
Fintzo [P] y != 0? [Ri] Fintzo [Ri] y := abs(y) [R] 
(seq) 
Fintzo [P] rı [R] 


Fig. 5: Derivation of Fint,, |P] rı [R] for Example 5. 


while 
DDiee(P) = [rolielrili,e((-00; +00] x [-100; 100}) 
= [r]i ly := abs(y)]'"*([—00; +00] x [—100; 100]) 
= [re]}..({-00; +00] x [0; 100]) 
= ([1; 1] x (0; 100]) # ([1;1] x [15 )). 


The issues are twofold. First, the analysis of rı in Int is incomplete, so we need 
a more concrete domain. For instance Intzo, the Moore closure of Int with the 
addition of the element Z 49 representing the property of being nonzero would 
work. Intuitively, Intz9 contains all intervals, possibly having a “hole” in 0. 
Formally 

Intzo = Int U {Igo | I € Int} 


with y’ (Izo) = y(Z) \ {0}. However, note that there is no need for a relational 
domain to analyze rı since variable x is never mentioned in it. On the contrary, 
the analysis of r2 requires a relational domain to track the information that the 
value of variable x is equal to the value of variable y. This suggests, for instance, 
to use the octagons domain Oct [15] to analyze r2. It is worth noting that the 
domain of octagons Oct would not be able to perform a locally complete analysis 
of rı for the same reasons that the domain Int could not. 

However, rule (refine-pre) allows us to combine these different proof deriva- 
tions. Since the program state between rı and r2 can be precisely represented in 
Int, we use this domain as a baseline and refine it in Intz9 and Oct for the two 
parts respectively. 

Let R = (y € {1;2;100}) that is an under-approximation of the concrete 
state in between rı and ro with the same abstraction in Int, so we can prove 
the triple Fine [P] rı [R]. Note that the concrete point 2 was added to R in 
order to have local completeness for (x > 1)? in ro. However, this triple cannot 
be proved in Int because [pee is not locally complete on P, so we resort to 
(refine-pre) to change the domain to Intzo. The full derivation in Intzo is shown 
in Fig. 5, where Rı = (y € [—100; 100] A y 4 0) and we omitted for simplicity 
the additional hypothesis of (relax). 

Again [re] is locally complete on R in Int, but the compositional analysis 
[role is not. Hence to perform the derivation we resort to (refine-pre) to intro- 
duce relational information in the abstract domain, using Oct instead of Int. Let 
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Q =(x=1Ay=1), that is the concrete output of the program, so that we can 
prove Fint [R] r2 [Q]. The derivation of this triple is only in Appendix A, Fig. 6. 
However, the proof is just a straightforward application of rules (seq), (iterate) 
and (transfer). 

With those two derivation, the proof of the triple Fint [P] r [Q] is straightfor- 
ward using (refine-pre): 


Foet [R] r2 [Q] 


Fintzo [P] r1 [R] 
Fint [R] r2 [Q] 


Fin [P] r [R] (refine-pre) 


Fint [P] r [Q] 


(refine-pre) 
(seq) 


For the derivation to fit the page, we write here the additional hypotheses of the 
rules. For the first application, Intzo < Int and Intzo(P) = P = Int(P). For the 
second, Oct < Int and Int(R) = (y € [1; 100]) = Oct(R). 

It is worth noting that, in this example, all applications of (refine-pre) can be 
replaced by (refine-int). This means that also the latter is able to exploit Into 
and Oct to prove the triple in the very same way, but its application requires 
more expensive abstract analyses than the simple checks of (refine-pre). 


While (refine-pre) is simpler than (refine-ext) and (refine-int), it is also weaker 
in both a theoretical and practical sense. On the one hand, LCL, extended with 
this rule does not admit a logical completeness result; on the other hand, there 
are situations in which, even though (refine-pre) allows a derivation, the other 
rules are more effective. We show these two points by examples. For the first, 
we propose a sound triple that LCL, extended with (refine-pre) cannot prove. 
Since the example is quite technical, here we only sketch the idea, and leave the 
details only in Appendix A, Example 8. 


Example 6 (Logical incompleteness of (refine-pre)). Consider the concrete do- 
main C = P(Z) of integers, the abstract domain Int of intervals, the concrete 
point P = {—1,1} and commands rı Ê x != 07, r2 Ê x >= 0? and r Ê rj ro. 
Then the triple Fine |P] r1;r2 [{1}] is sound but cannot be proved in LCL, 
extended with (refine-pre). 

The key observations for this example are two. First, all strict subset P’ C P 
are such that Int(P’) C Int(P). Moreover, for all refinements A’ < Int such 
that A’(P) = Int(P) we have the same condition, namely if P’ C P then 
A'(P’) C A'(P). This is because for all P’ C P we have A’(P’) C Int(P’) C 
Int(P) = A’(P). Second, [ri]P = P. This means that all triples appearing in 
the derivation tree of Fint [P] r1;r2 [{1}] have the same precondition P. Since 
(refine-pre) requires A’(P) = Int(P), all possible applications of this rule change 
the abstract domain to some A’ satisfying the condition above. Since LCL 4 com- 
putes under-approximations with the same abstraction of the strongest postcon- 
dition, these two observations make it impossible to under-approximate P fur- 
ther, both with (relax) and (refine-pre). This in turn make the triple not provable 
because [r2] is not locally complete on P in Int or in any refinement satisfying 
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Al(P) = Int(P): 


A'Tra](P) = A'({1}) € Int({1}) = {1} 
A'r] A (P) 2 [re] A’(P) = [re] (Int(P)) = {0,1}. 


Example 8 in Appendix A exhibits the formal argument showing that this triple 
cannot be proved. 


As a corollary, this example (and more in general logical incompleteness) shows 
that is not always possible to find a refinement A’ to carry out the proof using 
(refine-pre). Another consequence of this incompleteness result is the fact that, 
even when a command is locally complete in an abstract domain A, we may need 
to reason about properties that are not expressible in A in order to prove it, as 
(refine-pre) may not be sufficient. 

Second, we present an example to illustrate that there are situations in which 
(refine-int) is more practical than (refine-pre), even though they are both able to 
prove the same triple. 


Example 7. Consider the two program fragments 


(y != 0)?; x := y; y := abs(y) 


ri 


> 


rn x := y; while (x > 1) { y := y- 1; x:=x-1} 


and the program r Ê r1; r2. Consider also the initial state P = y € [—100; 100]. 

This example is a variation of Example 5: the difference is the introduction 
of the relational dependency x := y in rı, that is partially stored in the post- 
condition R of rı. Because of this, Oct(R) and Int(R) are different, so we cannot 
apply (refine-pre) to prove [R] r2 [Q] for some Q. 

Following Example 5, the domain Intżọ is able to infer on rı a subset R 
of the strongest postcondition y € [1;100] A y = abs(x) with the same ab- 
straction Intzo(R) = [—100; 100]z40 x [1; 100]. However, for any such R we can- 
not use (refine-pre) to prove the triple Fmt [R] ro |x = 1^y = 1] via Oct 
because Int(R) = x € [—100;100] A y € [1;100] while Oct(R) = 1 < y < 
100 A —y < x < y. More in general, any subset of the strongest postcondition 
contains the relational information y = abs(x), so relational domains like oc- 
tagons and polyhedra [9] do not have the same abstraction as the non-relational 
Int, preventing the use of (refine-pre). However, we can apply (refine-int): con- 
sidering R = (y € {1;2;100} A y = abs(x)), Q = (x = 1^y = 1) and 
fw € while (x > 1) {y := y- 1; x :=x- 1 }, we have 


Int[rə]ġalnt(R) = Int[r2]ġ «(x € [—100; 100] A y € [1; 100]) 
= Int[rolbalk := ylġa(x € [-100; 100] A y € [1; 100]) 
= Int[rulba(1 < y < 100, y = x) 
Int(x=1^y=1) 
= Int(Q). 
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In this example, rule (refine-pre) can be applied to prove the triple, but it 
requires to have relational information from the assignment x := y in rı, hence 
forcing the use of a relational domain (eg. the reduced product [7] of Oct and 
Intzo) for the whole r, making the analysis more expensive. 


4.3 Choosing The Refinement 


All three new rules allow to combine different domains in the same derivation, 
but do not define an algorithm because of the choice of the right refinement to 
use is nondeterministic. A crucial point to their applicability is a strategy to 
select the refined abstract domain. While we have not addressed this problem 
yet, we believe there are some interesting starting points in the literature. 

As already anticipated in previous sections, we settled the question from 
a theoretical point of view. Logical completeness results for (refine-ext) (Theo- 
rem 4) and (refine-int) (Theorem 5) implies the existence of a domain in which it 
is possible to complete the proof (if this were not the case, then the proof could 
not be completed in any domain, against the logical completeness). However, the 
proofs of those theorems exhibit the concrete domain C as an example, which is 
unfeasible in general. Dually, as (refine-pre) is logically incomplete (Example 6), 
there are triples that cannot be proved in any domain with it. 

As more practical alternatives, we envisage some possibilities. First, we are 
studying relationships with counterexample-guided abstraction refinement (CE- 
GAR) [4], which is a technique that exploits refinement in the context of abstract 
model checking. However, CEGAR and our approach seem complementary. On 
the one hand, our refinement rules allow a dynamic change of domain, during 
the analysis and only for a part of it, while CEGAR performs a static refinement 
and then a new analysis of the whole transition system in the new, more precise 
domain. On the other hand, our rules lack an instantiation technique, while for 
CEGAR there are effective algorithms available to pick a suitable refinement. 

Second, local completeness shell [3] were proposed as an analogous of com- 
pleteness shell [11] for local completeness. In the article, the authors proposed to 
use local completeness shells to perform abstract interpretation repair, a tech- 
nique to refine the abstract domain depending on the program to analyse, just 
like CEGAR does for abstract model checking. Abstract interpretation repair 
works well with LCL, and could be a way to decide the best refinement for 
one of our rules in presence of a failed local completeness proof obligation. The 
advantage of combining repair with our new rules is given by the possibility of 
discarding the refined domain just after its use in a subderivation instead of using 
it to carry out the whole derivation. Investigations in this direction is ongoing. 

Another related approach, which shares some common ground with CEGAR, 
is Lazy (Predicate) Abstraction [12,14]. Both ours and this approach exploits dif- 
ferent abstract domains for different parts of the proof, refining it as needed. The 
key difference is that Lazy Abstraction unwinds the control flow graph (CFG) 
of the program (with techniques to handle loops) while we work inductively on 
the syntax. This means that, when Lazy Abstraction refines a domain, it must 
use it from that point onward (unless it finds a loop invariant). On the other 
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Proof system Extensional | Logical completeness 


Plain LCL x x 


LCL, + (refine-ext) v v 
LCL, + (refine-int) v v 
LCL, + (refine-pre) v x 


Table 1: Comparison of the proof systems 


hand, our method can change abstract domain even for different parts of se- 
quential code. However, the technique used in Lazy Abstraction (basically to 
trace a counterexample back with a theorem prover until it is either found to 
be spurious or proved to be true) could be applicable to LCL 4: a failed local 
completeness proof obligation in (transfer) can be traced back with a theorem 
prover and the failed proof can be used to understand how to refine the abstract 
domain. 


5 Conclusions 


In this paper, we have proposed a logical framework to prove both correctness 
and incorrectness of a program exploiting locally complete abstractions. Indeed, 
from any provable triple [P] r [Q] we can either prove that r meets an expressible 
specification Spec or find a concrete counterexample in Q. Differently from the 
original LCL, [2], that was proved to be intensionally sound, our framework 
is extensionally sound, meaning that is able to prove more properties about 
programs. To achieve this, our inference rules are based on the best correct 
abstraction of a program behaviour instead of a generic abstract interpreter. 
The key feature of our proof systems is the ability to exploit different abstract 
domains to analyse different portions of the whole program. In particular, the 
domains are selected among the refinements of a chosen abstract domain from 
which the analysis begins. The main advantage of our extensional approach is 
the possibility of proving many triples that could not be proved in LCL 4 because 
of the way the program is written. More in details, we presented three new rules 
to refine the abstract domain, each of which can be added independently to the 
proof system with different complexity-precision trade-off. 

Table 1 summarizes the properties LCL, enjoys when extended with differ- 
ent rules, and Figure 1 from the Introduction graphically compare the logical 
strength of these proof systems. (refine-ext) is the most general rule, from which 
the other two (refine-int) and (refine-pre) are derived. The former turns out to be 
as strong as (refine-ext), since they are both logically complete, while the latter 
is simpler to use, although weaker. 
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Future work. In principle completeness could be achieved either refining or sim- 
plifying the abstract domain [11]. In this article we have only focused on refine- 
ment rules for local completeness, but we are investigating some simplification 
rules as well as their relation to the ones presented in this paper. To date, domain 
simplification seems theoretically weaker, but apparently it can accommodate for 
techniques useful in practice that are beyond the reach of refinement rules. 

While the new rules we introduced are relevant from both a theoretical and 
practical point of view, they do not define an algorithm because of their nonde- 
terminism: we need techniques to determine when a change of abstract domain 
is needed and how to choose the most convenient new domain. We believe these 
two issues are actually related. For instance, if the analysis is unable to satisfy 
a local completeness proof obligation to apply (transfer), an heuristics may de- 
termine both what additional information is needed to make it true (i.e., how to 
refine the abstract domain) and where that additional information came from 
(i.e., when to refine). We briefly discussed in Section 4.3 some possibilities to 
perform this choice. Ideally, one would systematically select an off-the- shelf ab- 
stract domain best suited to deal with each code fragment and the heuristic 
would inspect the proof obligations, and exploit some sort of catalog that can 
track suitable abstract domains that are locally complete for the code and in- 
put at hand or derive on-the-fly some convenient domain refinement as done, 
e.g., by partition refinement. To this aim, we intend to investigate a mutual ex- 
change of ideas between CEGAR and our approach, and to integrate abstract 
interpretation repair into our framework. 


Acknowledgments. We thank the anonymous referees for their helpful comments 
that helped us to improve the presentation and the discussion with related work. 


Appendix A Proofs and Supplementary Material 


A.1 Extensional Soundness (Theorem 2) 


Proof (Proof of Theorem 2). First we remark that points (1) and (3) implies 
point (2): 


a(Q) < a([r]P) [(1) and monotonocity of a] 
< [r]4a(P) [soundness of [r]4] 
= a(Q) [(3)] 


So all the lines are equal, in particular a(Q) = a([r]P). The proof is then by 
induction on the derivation tree of F4 [P] r [Q], but we only have to prove (1) 
and (3) because of the observation above. We only include one inductive case as 
an example, others are standard. 

(seq): (1) Q < [reJR < [ro] ([nJP) = [rl;r2]P, where the inequalities follow 
from inductive hypotheses and monotonicity of [ro]. 
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(3) We recall that fri; r2]4 < [re]4[r]4. 


a(Q) < a(lrı; r2] P) [(1) and monotonicity of a 
< [5 ro]4a(P) [soundness of [r]4 
< lr} lrn] a(P) [recalled above 
= [r.]4a(R) [inductive hp 
=a(Q) [inductive hp 


So all the lines are equal, in particular [r1;r2]4a(P) = a(Q). 


A.2 Soundness and Completeness of (refine-ext) 


This technical lemma is used in the following proofs. 


Lemma 1. If A’ < A then A= AA = A'A 


Proof. Fix a concrete element c € C. Since A’ < A we have c < A’(c) < A(c). 
Applying A, by monotonicity we get A(c) < AA’(c) < AA(c) = A(c), where 
the last equality is idempotency of A. This means A = AA’. Now consider 
A’ A(c). Since A is a closure operator A’ A(c) < A(A’A(c)). But we just showed 
AA'(A(c)) = A(A(c)) = A(c). Lastly, since A’ is a closure operator too, A(c) < 
A’ A(c). Hence A(c) < A’ A(c) < A(c), so A(c) = A’A(c). 


We point out that, by injectivity of y, this also means ay'a’ = a. 


Proof (Proof of Theorem 3). We recall that the intuitive premise A[r]“ A(P) = 
A(Q) of the rule formally is ay'[r]“’ a’ A(P) = a(Q). Since the proof of The- 
orem 2 is by induction, we can extend it just proving the inductive case for 
(refine-ext). 

(1) It’s the same as point (1) of extensional soundness (Theorem 2) applied to 
Ha [P] r [Q], since this conclusion does not depend on the abstract domain. 
(2-3) 


a(Q) < a([r]P) [(1) and monotonicity of a 
< [r]4a(P) [soundness of [r]4 
= a[r]ya(P) [definition 
= aya’ |r] a ya(P) [Lemma 1 
= ay [r]“ a' A(P) [definition 
= a(Q) [hypothesis of the rule 


Hence all the lines are equal; in particular a([r]P) = a(Q) and [r]4a(P) = 
a(Q). 


22 F. Ascari et al. 


Proof (Proof of Theorem 4). First, the hypotheses of the theorem implies 
Ca([r]): 


[J“a(P) = a(Q) [hp of the theorem] 
<a/([r]P) [monotonicity of a and hp of the theorem Q < [r] P] 
< [r]4a0(P) [soundness of [r]“] 


Hence a([r]P) = [r]4a(P) = afr]ya(P), that is local completeness. Moreover 
a(Q) = a([r}P). 

Now consider a triple P,r,Q satisfying the hypotheses. If Q < |r] P, using 
(relax) we get 


PSPSA(P) Fa [P] r [[P]_Q < [P< AQ) 
Fa [P] r [Q] 


But the first condition is trivial, and the third one is made of Q < [r] P (the 
hypothesis) and [r]P < A(Q), that follows because a([r]P) = a(Q) (shown 
above) and in a GC this implies [r]P < ya(Q) = A(Q). Hence without loss of 
generality we can assume Q = [r]P. 

Now we want to apply (refine-ext) to move to the concrete domain C. Clearly 
C < A. The last hypothesis of the rule can be readily verified recalling that 
[r]° = [r] and a! = y = ide: 


(relax) 


afr] A(P) = afr] A(P) 
= [r] a(P) 
= a([r]P) 
so if we can show Fc [P] r [[r]P] we can apply (refine-ext) to prove the triple 
Fa [P] r [fr] P]: 
Fc [P] r [IP] CA Alr]°A(P) = A([r]P) 
Fa [P] r [[r]P] 


(refine-ext) 


Lastly, we resort to logical completeness of LCL 4 (cf. [2], Th 5.11) to say that 
the triple Fe [P] r [[r] P] is provable. The hypothesis of that theorem are satisfied: 
all expressions are globally complete in the concrete domain C, [r]P < [r]P and 
[r]ide(P) = [r]P = idc([r]P), where we used a’ = idc and [rl = [r]. 


A.3 Derived Refinement Rules 


Proof (Proof of Proposition 1). We show that the hypotheses of (refine-int) 
implies those of (refine-ext). This means than whenever we can apply the former 
we could also apply the latter, that in turn means Theorem 3 ensures extensional 
soundness. 
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The first two hypotheses F4, [P] r [Q] and A’ < A are shared among the 
two rules, so we only have to show a7[r]4 a’A(P) = a(Q). We recall that 
Fa [P] r [Q] implies Q < [rr] P by extensional soundness. 


a(Q) < a([r]P) [IQ < [r]P and monotonicity of a 
< [r]4a(P) [soundness of [r]4 
= a|r] A(P) [definition 
=aya'[r]A’A(P) [Lemma 1 
= ay [r]* a' A(P) [definition 
< ay’ [r]',,0' A(P) Mg < (rd, 
=qa(Q) [Last hypothesis of the rule 


Hence all the lines are equal, and in particular ay’ f|r]^ a’ A(P) = a(Q). 


Proof (Proof of Theorem 5). The proof is the same as that of Theorem 4, the 
only difference being that to apply (refine-int) we need to show Afr], A(P) = 
A([r] P) instead of Afr] A(P) = A([r].P). However, since in the concrete domain 
BH = [r]° = [r] the proof still holds. 


Proof (Proof of Proposition 2). As in the proof or Proposition 1 above, we show 
that the hypotheses of (refine-pre) implies those of (refine-ext). 

The first two hypotheses + 4 [P] r [Q] and A’ < A are shared among the 
two rules, so we only have to show ay'[r] a’A(P) = a(Q). We recall that 
Ha [P] r [Q] implies by extensional soundness (1) Q < [r] P and (3) [4 a (P) = 
a' (Q). 


a(Q) < a([r]P) [Q < [r]P and monotonicity of a 
< [r]4a(P) [soundness of [r]4 
= a|r] A(P) definition 
= alr] A’(P) [hp of the rule 
= ay'a’ |r] A' (P) Lemma 1 
= a7 [r]* a (P) definition 

ay'a (Q) [extensional soundness (3) 
=a(Q) Lemma 1 


Hence all the lines are equal, and in particular a7 [r]4 a! A(P) = a(Q). 


Details about Example 5. The full derivation of the triple Foc |R] r2 [Q] for 
Example 5 is shown in Fig. 6, rotated and split to fit the page. The command 
r= (x > 1)?; y := y- 1; x := x - 1 is iterated with the Kleene star and 
we let Ro = (y € {1;2;100} A x = y). We also used the logical implication 
Rə => (y € {1;99} Ax = y), both explicitly and implicitly in the equivalence 
Ro V (y € {1,99} Ax = y) = Ro. 
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CRP =oD ranse CEE >= 0D 
Fintp [P] x != 0? [P] Fintp [P] x >= 0? [Q] 
Fintp [P] r1;r2 [Q] Inte < Int Int([r] fr, (Int(P))) = Int(Q) 
Fine [P] r1; r2 [Q] 


(transfer) 
(seq) 


(refine-int) 


Fig. 7: Derivation of Fint [P] r [Q] for Example 8. 


Example 8 (Supplement to Example 6). Consider the concrete domain C = P (Z) 
of integers, the abstract domain Int of intervals, the concrete points P = {—1, 1} 
and Q = {1}, commands rı 4 x != 0?, rp = x >= 0? and r Ê r1;r2. Let fy = 
(rij, fo = [re] and f = [r] = foo fı. Observe that in the concrete semantics 
fi(P) = P and f(P) = fo(P) = {1}. Consider LCL, extended with (refine-pre), 
and let us show that we cannot prove Fint [P] r [Q]. Inspecting the logic, we 
can only apply three rules to prove this triple: (relax), (refine-pre) or (seq). To 
apply rule (relax) we would need either an under-approximation P’ of P with the 
same abstraction, that does not exist, or an over-approximation of Q, that would 
be unsound since Q = f(P). Hence we cannot apply (relax). Suppose to apply 
(refine-pre): any A’ used in the rule should satisfy A’ < Int and A’(P) = Int(P); 
as we remarked in Example 6 this means that P’ C P implies A’(P’) C A’(P). 
Again this means we cannot apply (relax) even after the domain refinement. The 
only rule that can be applied is then (seq): to do that, we must prove two triples 
Fa [P] rı [R] and Fa [R] r2 [Q]. Irrespective of how we prove the first triple, 
by soundness (Theorem 2) we have R C fi(P) = P and A’(R) = A'(fi(P)) = 
A'(P), so again R = P. Now we should prove a triple Fa [P] r2 [Q], but this is 
impossible since by soundness this would imply local completeness of [r2] = fo 
on P in A’, that does not hold: 


A’ fa(P) = A'({1}) € Int({1}) = {1} 
A' f A'(P) 2 f2A'(P) = fa(Int(P)) = {0,1} 


Observe that, if we add (refine-int) to the proof system, we can use it to 
change the domain to one where we can express P (for instance, the concrete 
domain P(Z) or the refinement IntU{P}) to prove the triple applying (seq) and 
then (transfer) on both subtrees, as shown in Fig. 7. 
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Abstract. We construct novel thread-modular analyses that track rela- 
tional information for potentially overlapping clusters of global variables 
— given that they are protected by common mutexes. We provide a frame- 
work to systematically increase the precision of clustered relational anal- 
yses by splitting control locations based on abstractions of local traces. As 
one instance, we obtain an analysis of dynamic thread creation and join- 
ing. Interestingly, tracking less relational information for globals may re- 
sult in higher precision. We consider the class of 2-decomposable domains 
that encompasses many weakly relational domains (e.g., Octagons). For 
these domains, we prove that maximal precision is attained already for 
clusters of globals of sizes at most 2. 


Keywords: thread-modular relational abstract interpretation, collect- 
ing local trace semantics, clusters, dynamic thread creation, concurrency 


1 Introduction 


Tracking relationships between program variables is indispensable for proving 
properties of programs or verifying the absence of certain programming errors 
[14, 16, 33]. Inferring relational properties is particularly challenging for multi- 
threaded programs as all interferences by other threads that may happen in 
parallel, must be taken into account. In such an environment, only relational 
properties between globals protected by common mutexes are likely to per- 
sist throughout program execution. Generally, relations on clusters consisting 
of fewer variables are less brittle than those on larger clusters. Moreover, mono- 
lithic relational analyses employing, e.g., the polyhedral abstract domain are 
known to be notoriously expensive [36, 54]. Tracking smaller clusters may even 
be more precise than tracking larger clusters [19]. 


Example 1. Consider the following program. All accesses to globals g, h, and i 
are protected by the mutex a. 


© The Author(s) 2023 
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main : bL: t2: 

x = create(t1); y = create(t2); lock(a); lock(a); 
lock(a); x= h; g=7?;h=?; 
g=?;h=7?;i=?; i = x; unlock(a) ; 
unlock(a); r = join(y); lock(a); unlock(a) ; return 0; 
z= ?; g =z; h=2; i = 2; return 1; 


unlock(a); lock(a); 
// ASSERT(h==i); (1) ASSERT(g==h); (2) 
unlock(a); 


In this program, the main thread creates two new threads, starting at tı and to, 
respectively. Then it locks the mutex a to set all globals non-deterministically 
to some value and unlocks a again. After having joined the thread tz, it locks 
a again and sets all globals to the same unknown value and unlocks a again. 
Thread t; sets į to the value of h. Thread t sets g and h to (potentially different) 
unknown values. Assume we are interested in equalities between globals. In order 
to succeed in showing assertion (1), it is necessary to detect that the main thread 
is unique and thus cannot read its past writes since these have been overwritten. 
Additionally, the analysis needs to certify that thread tz also is unique, has been 
joined before the assertion, and that its writes must also have been overwritten. 

For an analysis to prove assertion (2), propagating a joint abstraction of the 
values of all globals protected by a does not suffice: At the unlock of a in tı, 
g=h need not hold. If this monolithic relation is propagated to the last lock of 
a in main, (2) cannot be shown — despite tı modifying neither g nor h. 


Here we show, that the loss of precision indicated in the example can be 
remedied by replacing the monolithic abstraction of all globals protected by a 
mutex with suitably chosen subclusters. In the example, we propose to instead 
consider the subclusters {g,h} and {h,i} separately. As tı does not write any 
values to the cluster {g,h}, the imprecise relation T is not propagated to the 
main thread and assertion (2) can be shown. 

To fine-tune the analysis, we rely on weakly relational domains. A variety 
of weakly relational domains have been proposed in the literature such as Two 
Variables Per Inequality [53], Octagons [36, 37], or simplifications thereof [33, 35]. 
The technical property of interest which all these domains have in common is that 
each abstract relation can be reconstructed from its projections onto subclusters 
of variables of size at most 2. We call such domains 2-decomposable. Beyond the 
numerical 2-decomposable domains, also non-numerical 2-decomposable domains 
can be constructed such as a domain relating string names and function pointers. 

Based on 2-decomposable domains, we design thread-modular relational anal- 
yses of globals which may attain additional precision by taking local knowledge 
of threads into account. Therefore, we do not rely on a global trace semantics, 
but on a local trace semantics which formalizes for each thread that part of the 
computational past it can observe [48]. Abstract values for program points de- 
scribe the set of all reaching local traces. Likewise, values recorded for observable 
actions are abstractions of all local traces ending in the corresponding action. 
Such observable actions are, e.g., unlock operations for mutexes. The abstract 
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values are then refined by taking finite abstractions of local traces into account. 
To this end, we propose a generic framework that re-uses the components of any 
base analysis as black boxes. Our contributions can be summarized as follows: 


— We provide new relational analyses of globals as abstractions of the local 
trace semantics based on overlapping variable clusters (Sections 3, 4, and 8). 

— Our analysis deals with dynamically created and joined threads, whose thread 
ids may, e.g., be communicated to other threads via variables and which may 
synchronize via mutexes (Section 3). 

— We provide a generic scheme to incorporate history-based arguments into the 
analysis by taking finite abstractions of local traces into account (Section 5). 

— We give an analysis of dynamically created thread ids as an instance of 
our generic scheme. We apply this to exclude self-influences or reads from 
threads that cannot possibly run in parallel (Sections 6 and 7). 

— We prove that some loss of precision of relational analyses can be avoided 
by tracking all subclusters of variables. For the class of 2-decomposable 
relational domains, we prove that tracking variable clusters of size greater 
than 2 can be abandoned without precision loss (Section 8). 


The analyses in this paper have all been implemented, a report of a practical 
evaluation is included in Section 9, whereas Section 10 details related work. 


2 Relational Domains 


First, we define the notion of relational domain employed in the description of 
our analysis. Let Vars be a set of variables, potentially of different types. We 
assume all configurations and assignments to be well-typed, i.e., the type of the 
(abstract) value matches the one specified for a variable. For each type 7 of 
values, we assume a complete lattice VË of abstract values abstracting the re- 
spective concrete values from V,. Let VË denote the collection of these lattices, 
and Vars +, VË denote the set of all type-consistent assignments o from vari- 
ables to non- abstract values, extended with a dedicated least element (also 
denoted by L), and equipped with the induced ordering. A relational domain R 
then is a complete lattice which provides the following operations 


[ac eli, : R —> R (assignment for expression e) lift : (Vars +1 V*) > R 
rly: R > R (restriction to Y C Vars) unlift : R — (Vars >, V?) 
[?el*. : R —> R (guard for condition e) 


The operations to the left provide the abstract state transformers for the basic 
operation of programs (with non-deterministic assignments expressed as restric- 
tions), while lift and unlift allow casting from abstract variable assignments to 
the relational domain as well as extracting single-variable information. We as- 
sume that lift L = L and unlift L = L, and require that unlift o lift I id where 3 
refers to the ordering of (Vars +, VË). Moreover, we require that the meet op- 
erations M of V? and R safely approximate the intersection of the concretizations 
of the respective arguments. Restricting a relation r to a subset Y of variables 
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amounts to forgetting all information about variables not in Y. Thus, we demand 


Tlyare =" rl9 S rly, J rly, when Yı C Yo, (rly ly, = Tlyaye and 


unlift(rl,J2=T (gY) unlift (r|,-) x = (unliftr)~ (xeY) (1) 


Restriction thus is idempotent. For convenience, we also define a shorthand for as- 
signment of abstract values®: |x +’ vulh r= (rlari) N (lift (T @ {x > v})). 
In order to construct an abstract interpretation, we further require monotonic 
concretization functions yy; : VË > 2Y and yr : R > 2%"8>Y satisfying the 
requirements presented in Fig. 1. 


Example 2. As a value domain VË, consider the flat lattice over the sets of values 
of appropriate type 7. A relational domain R is obtained by collecting satisfiable 
conjunctions of equalities between variables or variables and constants where the 
ordering is logical implication, extended with False as least element. The greatest 
element in this complete lattice is given by True. The operations lift and unlift 
for non-L arguments then can be defined as 


c ifr = («#=c) 


lifto = {z = ox | z € Vars,ox # T} unliftr z = : 
T otherwise 


The restriction of r to a subset Y of variables is given by the conjunction of all 
equalities implied by r which only contain variables from Y or constants. 


In line of Example 2, also non-numerical relational domains may be constructed. 
A variable clustering S C 2”"S is a set of subsets (clusters) of variables. For 
any cluster Y C Vars, let RY = {r | r € R,r|y = r}; this set collects all abstract 
values from R containing information on variables in Y only. Given an arbitrary 
clustering S C 2”"*, any relation r € R can be approximated by a meet of 
relations from RY (Y € S) since for every r € R, r E [Hr]y | Y € S} holds. 
Some relational domains, however, can be fully recovered from their restric- 
tions to specific subsets of clusters. We consider for k > 1, the set Sẹ of all 
non-empty subsets Y C Vars of cardinality at most k. We call a relational do- 
main R k-decomposable if each abstract value from R can be precisely expressed 


3 We use o © {ai œ v; | i = 1,...,m} to denote the variable assignment obtained 
from o by replacing the values for x; with v; (i = 1,...,m). 
Va,b:alb =} yra C ys bd yRL =o Yr,s:r Es => yrrCyrs 


yr (Iz + elkr) 2 {0 @ {z > lelo} | o € yrr} 
qr(r|y) 2 {0 @ {a1 > v1,..., Em © Vm} | vi E V, £i E€ Vars \ Y,o € yrr} 
yr (lifto#) D {0 | Yx: ox € yy: (ot x)} yyt (unliftr) x D {ox |0 E€ yrr} 


Fig. 1: Required properties for yy! : VË > 2Y and yr : R => 2er, 
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as the meet of its restrictions to clusters of Sp and when all least upper bounds 
can be recovered by computing with clusters of S; only; that is, 


r=M{rlel@es} = URVlg=UfrlelreR} (QES) (2) 
holds for each abstract relation r € R and each set of abstract relations R C R. 


Example 3. The domain Rı from the previous example is 2-decomposable. This 
also holds for the octagon domain [36] and many other weakly relational numeric 
domains (pentagons [33], weighted hexagons [21], logahedra [28], TVPI [53], 
dDBM [46], and AVO [11]). The affine equalities or affine inequalities domains 
[16, 30], however, are not. The relational string domains proposed by Arceri et al. 
[6, Sec. 5.1 - 5.3], are examples of non-numeric 2-decomposable domains. 


3 A Local Trace Semantics 


We build upon the semantic framework for local traces, introduced by Schwarz 
et al. [48]. A local trace records all past events that have affected the present 
configuration of a specific thread, referred to as the ego thread. In [48], the local 
trace semantics is proven equivalent to the global trace semantics which itself is 
equivalent to a global interleaving semantics. In particular, any analysis that is 
sound w.r.t. the local trace semantics also is w.r.t. the interleaving semantics. 
While the framework of Schwarz et al. [48] allows for different formalizations 
of traces, thread synchronization happens only via locking/unlocking and thread 
creation. Generalizing their semantics, we identify certain actions as observable 
by other threads when executing corresponding observing actions (see Table 1). 
When the ego thread executes an observing action, a local trace ending in the 
corresponding observable action is incorporated. Here, we consider as observ- 
able/observing actions locking /unlocking mutexes and creating/joining threads. 
Consider, e.g., the program in Fig. 2a and a corresponding local trace (Fig. 2b). 
This trace consists of one swim lane for each thread representing the sequence 
of steps it executed where each node in the graph represents a configuration at- 
tained by it. Additionally, the trace records the create and join orders as well as 
for each mutex a, the locking order for a (>c, >j, and —a, respectively). These 


Table 1: Observable and observing actions and which concurrency primitive they 
relate to. The primitives targeted by this paper are in bold font. 


Observable Action Observing Action Programming Concept 
unlock(a) lock(a) Mutex, Monitor, ... 
return x x’ =join(a’’) Thread Returning / Joining 
g=2 r= 9g Writing /Reading a global variable 
signal (c) wait (c) Condition Variables 


send(chan,v) x = receive(chan) Channel-Based Concurrency, Sockets, ... 
set_value get Futures / Promises 
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main: t1: x=create(t2) y=create(tı) — lock(my,) g=1 _unlock(mg) 
x = create(t2); z = 1; 


my 


y = create(t1); z = join(x); ot BB am as 
lock(mg); lock(mg); > O 
g= 1; g = 25 i 
unlock(mg,) ; unlock (mg); z= 12 return z 
z = 28; x = create(t2); 
t2: ; A . 

= me (b) Local Trace; For this program, execution begins 
= 7 ? at program point main, and x,y,z are local vari- 
return Z; 


ables, whereas g is a global variable. To ensure atom- 
icity, every access to the global g is protected by the 
(a) Source code mutex Mg, which we omit in the further examples. 


Fig. 2: An example program and a corresponding local trace. 


orders introduce extra relationships between thread configurations. The unique 
start node of each local trace is an initial configuration of the main thread. 


We distinguish between the sets ¥ and G of local and global variables. We 
assume that ¥ contains a special variable self within which the thread id of 
the current thread, drawn from the set Z, is maintained. A (local) thread con- 
figuration is a pair (u,o) where u is a program point and the type-consistent 
map g : X + V provides values for the local variables. The values of globals 
are not explicitly represented in a thread configuration, but can be recovered 
by consulting the (unique) last write to this global within the local trace. To 
model weak memory effects, weaker notions of last writes are conceivable. As in 
[48], we consider a set of actions Act that consists of locking and unlocking a 
(non-reentrant) mutex from a set M, copying values of globals into locals and 
vice-versa, creating a new thread, as well as assignments with and branching on 
local variables. We extend Act with actions for returning from and joining with 
threads. We assume that writes to and reads from globals are atomic (or more 
precisely, we assume copying values of integral type to be atomic). This is en- 
forced for each global g by a dedicated mutex mg acquired just before accessing g 
and released immediately after. For simplicity, we associate traces corresponding 
to a write of g to this dedicated mutex m,, and thus do not need to consider 
writing and reading of globals as observable/observing actions. In examples, we 
omit explicitly locking and unlocking these mutexes. By convention, at program 
start all globals have value 0, while local variables may initially have any value. 

Each thread is represented by a control-flow graph with edges e € E of the 
form e = (u,act,u’) for some action act € Act and program points u and u’ 
where the start point of the main thread is ug. Let 7 denote the set of all 
local traces of a given program. A formalism for local traces must, for each 
edge e of the control-flow graph, provide a transformation fe] : TE > 27 so 
that [le] (to,...,tx-1) extends the local trace to, possibly incorporating other 
local traces. For the operations lock(a),a € M, or x=join(2’),2,2’ € Æ, the 
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arity of [e] is two, another local trace, namely, with last operation unlock(a) 
or return z”, respectively, is incorporated. The remaining edge transformations 
have arity one. In all cases, the set of resulting local traces may be empty when 
the operation is not applicable to its argument(s). We write Je] (To, ...,Tk-1) 
for the set Us ,67,,....tn-1€T,_1 lel] (40, ---,tk-1)- 

Given definitions of |e], the set 7 can be inductively defined starting from 
a set init of initial local traces consisting of initial configurations of the main 
thread. To develop efficient thread-modular abstractions, we are interested in 
subsets 7 [u], 7 [a], T[i] of local traces ending at some program point u, ending 
with an unlock operation for mutexes a (or from init), or ending with a return 
statement of thread i, respectively. Schwarz et al. [48] showed that such subsets 
can be described as the least solution of a side-effecting constraint system [5]. 
There, each right-hand side may, besides its contribution to the unknown on the 
left, also provide contributions to other unknowns (the side-effects). This allows 
expressing analyses that accumulate flow-insensitive information about globals 
during a flow-sensitive analysis of local states with dynamic control flow [51]. 
Here, in the presence of dynamic thread creation, we use side-effects to express 
that an observable action, unlock or return, should also contribute to the sets 
T [a] or T[i], such that they can be incorporated at the corresponding observing 
action. The side-effecting formulation of our concrete semantics takes the form: 


(n,n [Uo]) 3 ({[a] > init | acM}, init) (4,7 [u’]) 3 fu, act] (u,act,u’)EE (3) 


where the ordering _] is induced by the superset ordering and right-hand sides 
are defined in Fig. 3. A right-hand side takes an assignment 77 of the unknowns 
of the system and returns a pair (7’,T) where T is the contribution to the 
unknown occurring on the left (as in ordinary constraint systems). The first 
component collects the side-effects as the assignment 7’. If the right-hand sides 
are monotonic, Eq. (3) has a unique least solution. 

We only detail the right-hand sides for the creation of threads as well as the 
new actions join and return; the rest remain the same as defined by Schwarz 
et al. [48]. For thread creation, they provide the action x=create(u,). Here, 
u1 is the program point at which the created thread should start. We assume 
that all locals from the creator are passed to the created thread, except for the 


u,lock(a)] n = (Ø, [el(n ul, n [al)) fu, 2—create(us)] 9 = let T = [elfo ul) in 
u, unlock(a)] n = ({[ui] > new u u: (n [u])}, T) 

let T = [e] (n [u]) in [u, c=join(x’)] 7 = let T = ņ [u] in 

({la] > T}, T) (0, Jel (n lu], U{n [t (2°) | t € n [ul})) 

u,x = g] n = (O, fe] (n [u])) [u, return z] n = let T = n [u] in 

ug = z] n = (O, lelin la) ({li > [el({t € T | tse) = i}) | i € Z}, Ie]T) 


Fig. 3: Right-hand sides for side-effecting formulation of concrete semantics; t(y) 
extracts the value of local variable y from the terminal configuration of trace t. 
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variable self. The variables self in the created thread and z in the creating thread 
receive a fresh thread id. Here, newuu;t computes the local trace at the start 
point u from the local trace t of the creating thread. To handle returning and 
joining of threads we introduce the following two actions: 


— return x; — terminating a thread and returning the value of the local variable 
x to a thread waiting for the given thread to terminate. 

— x=join(a’); where x’ is a local variable holding a thread id — blocks the 
ego thread, until the thread with the given thread id has terminated. As in 
pthreads, at most one thread may call join for a given thread id. The value 
provided to return by the joined thread is assigned to the local variable x. 


For returning results and realization of join, we employ the unknown [i] for the 
thread id i of the returning thread, as shown in Fig. 3. 


4 Relational Analyses as Abstractions of Local Traces 


Subsequently, we give relational analyses of the values of globals which we base 
on the local trace semantics. They are generic in the relational domain R, with 
2-decomposable domains being particularly well-suited, as the concept of clusters 
is central to the analyses. We focus on relations between globals that are jointly 
write-protected by some mutex. We assume we are given for each global g, a set 
M({g] of (write) protecting mutexes, i.e., mutexes that are always held when g 
is written. Let Gla] = {g € G | a E€ M[g]} denote the set of globals protected by 
a mutex a. Let Ø Æ Qa C 29!) the set of clusters of these globals we associate 
with a. For technical reasons, we require at least one cluster per mutex a, which 
may be the empty cluster Ø, thus not associating any information with a. 

Our basic idea is to store at the unknown [a, Q] (for each mutex a and cluster 
Q € Qa) an abstraction of the relations only between globals in Q. By construc- 
tion, all globals in Q are protected by a. Whenever it is locked, the relational 
information stored at all [a, Q] is incorporated into the local state by the lattice 
operation meet, i.e., the local state now maintains relations between locals as 
well as globals which no other thread can access at this program point. When- 
ever a is unlocked, the new relation between globals in all corresponding clusters 
Q € Qa is side-effected to the respective unknowns [a, Q]. Simultaneously, all 
information on globals no longer protected, is forgotten to obtain the new local 
state. In this way, the analysis is fully relational in the local state, while only 
keeping relations within clusters of globals jointly protected by some mutex. 

For clarity of presentation, we perform control-point splitting on the set of 
held mutexes when reaching program points. Apart from this, the constraint 
system and right-hand sides for the analysis closely follow those of the concrete 
semantics (Section 3) — with the exception that unknowns now take values from 
R and that unknowns [a] are replaced with unknowns fa, Q] for Q € Qa. 

All right-hand sides are given in detail in Fig. 4. For the start point of the 
program and the empty lockset, the right-hand side init returns the T relation 
updated such that the variable self holds the abstract thread id ig of the main 
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init? n = Ilu, S], lock(a)]#n = 
let r(Q) = [{g -0 |gEQHkTin — (An [u 8] (Moco, nla QI)) 
let p = {[a,Q] > r(Q) | a E M,Q E Qa} [[u, S] unlock(a)]*n = 
in (p, [self —* iol T) let pe n [u, S] in 

Ilu, S], s=create(u1)]’n = let p = {[a, Q] > rlo |Q E Qa} in 
let r = n [u, S] in 
let i = vt uur in 
let r’ = { [self ct yt r} | in 


Q T| xUU{G[a’]la’e(S\a)} 
[[u, S], return a]n a 
let r = 7 [u, S] in 


eae = ot a r’} in let i? = unlift r self in 

alee RT i re x t r r 

[[u, 51,9 = sln = ({ 01 (beai) egf) 

(0, [9 — zJ (n lu, S})) [lu, S],2'=join(x)]?n = ONE 
et v = erami UNlift (p(t |) ret in 

[lu, S], £ a glên — let : Laniter EL lif ( [ ]) 

(0, [e + gl¥ (n lu, SI) (0, e = vlk C lu 5) 


Fig. 4: Right-hand sides for the basic analysis. All functions are strict in L (de- 
scribing the empty set of local traces), we only display definitions for non-L 
abstract values here. {g — 0| 9 € QHR is shorthand for the abstract trans- 
former corresponding to the assignment of 0 to all variables in Q one-by-one. 


thread. Additionally, init’ produces a side-effect for each mutex a and cluster Q 
that initializes all globals from the cluster with the value 0. 

For a thread creating edge starting in program point u with lockset S, the 
right-hand side [[u, $],=create(u1)]* first generates a new abstract thread id, 
which we assume can be computed using function v*. The new id is assigned to 
the variable x in the local state of the current thread. Additionally, the start state 
r’ for the newly created thread is constructed and side-effected to the thread’s 
start point with empty lockset [u1, Ø]. Since threads start with empty lockset, 
the state r’ is obtained by removing all information about globals from the local 
state of the creator and assigning the new abstract thread id to the variable self. 

When locking a mutex a, the states stored at unknowns fa, Q] with Q € Qa 
are combined with the local state by meet. This is sound because the value stored 
at any [a,Q] only maintains relationships between variables write-protected by 
a, and these values soundly account for the program state at every unlock(a) 
and at program start. When unlocking a, on the other hand, the local state 
restricted to the appropriate clusters Q € Qa is side-effected to the respective 
unknowns [a, Q], so that the changes made to variables in the cluster become 
visible to other threads. Also, the local state is restricted to the local variables 
and only those globals for which at least one protecting mutex is still held. 

As special mutexes mg immediately surrounding accesses to g are used to 
ensure atomicity, and information about g is associated with them, all reads and 
writes refer to the local copy of g. Guards and assignments (which may only 
involve local variables) are defined analogously. For a return edge, the abstract 


Clustered Relational Thread-Modular Abstract Interpretation 37 


value to be returned is looked up in the local state and then side-effected to the 
abstract thread id of the current thread (as the value of the dedicated variable 
ret). For join, the least upper bound of all return values of all possibly joined 
threads is assigned to the left-hand side of the join statement in the local state. 


Example 4. Consider the program* where M[g] = {a, b, mg}, M[h] = {a, b, mn}, 
Qa = {{g, h}}, Q, = Hg, h}}. 


main : ti: t2: 

x = create(t1); y = ?; lock (b) ; lock(b) ; 

lock(a); lock(b); unlock(b) ; lock(a) ; 

g = y; h = yt9; lock(a); // ASSERT(g==h); (4) 
unlock(b); lock(b); lock (b) ; unlock(a) ; 

h= y; // ASSERT (g==h); (3) unlock(b); 

// ASSERT (g==y); (1) y=?;g=y;h=y; 

// ASSERT (h==y); (2) unlock(b); 

unlock(b); unlock(a); unlock (a); 


x = create(t2); 


Our analysis succeeds in proving all assertions here. Thread tg is of particular 
interest: When locking b only g < h is known to hold, and locking the additional 
mutex a means that the better information g = h becomes available. The analysis 
by Mukherjee et al. [42] on the other hand only succeeds in proving assertion 
(2) — even when all globals are put in the same region. It cannot establish 
(1) because all correlations between locals and globals are forgotten when the 
migz operation is applied at the second lock(b) in the main thread. (3) cannot 
be established because, at lock(b) in tı, the mix operation also incorporates the 
state after the first unlock(b) in the main thread, where g = h does not hold. 
Similarly, for (4). The same applies for assertion (3) and the analysis using lock 
invariants proposed by Miné [39]. This analysis also falls short of showing (1), as 
at the lock(b) in the main thread, the lock invariant associated with b is joined 
into the local state. (4) is similarly out of reach. The same reasoning also applies 
to [39, 42, 48] after equipping the analyses with thread ids. 


Theorem 1. Any solution of the constraint system is sound w.r.t. the local trace 
semantics. 


Proof. The proof is by fixpoint induction, the details are given in Appendix B 
of the extended version [49] of this paper. 


We remark that, instead of relying on M|g] being pre-computed, an analysis can 
also infer this information on the fly [58]. 

The analysis however still has some deficiencies. All writes to a global are 
accumulated regardless of the writing thread. As a consequence, a thread does, 
e.g., not only read its latest local writes but also all earlier local writes, even if 


4 In all examples, g, h, and i are globals, whereas x, y, and z are locals, and the 
clusters at special mutexes mg, contain only g: Qm, = {{g}}. Unless explicitly stated 
otherwise, domain Rı from Example 2, enhanced with variable inequalities is used. 
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those are definitely overwritten. Excluding some threads’ writes is an instance 
of the more general idea of excluding writes that cannot be last writes. Instead 
of giving ad hoc remedies for this specific shortcoming, we propose a general 
mechanism to improve the precision of any thread-modular analysis in the next 
section, and later instantiate it to the issue highlighted here. 


5 Refinement via Finite Abstractions of Local Traces 


To improve precision of thread-modular analyses we take additional abstractions 
of local traces into account. Our approach is generic, building on the right-hand 
sides of a base analysis and using them as black boxes. We will instantiate this 
framework to exclude writes based on thread ids from the analysis in Section 4. 
Other instantiations are conceivable as well. To make it widely applicable, the 
framework allows base analyses that already perform some splitting of unknowns 
at program points (e.g., locksets in Section 4). We denote by [û] such (possibly) 
extended unknowns for a program point u. A (base) analysis is defined by its 
right-hand sides, and a collection of domains: (1) Dg for abstract values stored 
at unknowns for program points; (2) Dact for abstract values stored at observable 
actions act (e.g., in Section 4, Dm for unlocks and Dr for thread returns). 

Let A be a set of finite information that can be extracted from a local trace 
by a function a4:7—-A. We call a, tE A the digest of some local trace t. Let 
[u, act]*,:A*§+24 be the effect on the digest when performing a k-ary action 
act € Act for a control flow edge originating at u. We require for e=(u, act, v)EE, 


VAo,---;An—1 € A: |[[u,act]’,(Ao,.-., An—1)| <1 
Vto,- tk- ET :aa(fel(to,-.-,te-1)) € [u, act)", (ay to,..., œa tr-1) 


where ay is lifted element-wise to sets. While the first restriction ensures deter- 
minism, the second intuitively ensures that [w, act], soundly abstracts [e]. 

For thread creation, we additionally require a helper function new!, :N > 
N —> A —> A that returns for a thread created at an edge originating from u and 
starting execution at program point ui the new digest. The same requirements 


are imposed for edges (u, x=create(u1), v) € E, 
VAEA: new", uu Ao] <1 VtoET : aa(newuut) C new’, uu (aato) (5) 
Also, we define for the initial digest at the start of the program 
init®, = {aat |t € init} (6) 


Under these assumptions, we can perform control-point splitting according to 
A. This means that unknowns fû] for program points u are replaced with new 
unknowns [û, A], A € A. Analogously, unknowns for observable actions [act] are 
replaced with unknowns [act, A] for A € A. Consider a single constraint from 
an abstract constraint system of the last section, which soundly abstracts the 
collecting local trace semantics of a program. 


(n,n [6]) 2 [lâ], act]? n 
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[[@, Ao], act, A1]? n = [[@, Ao], act’, A'J" n = 
let 7 [x] = if [x] = [û] then let [x] = n [x, Ao] in 
n [û, Ao] let (p, v) = [fâ], act’]* n’ in 
else 7 [x, Ai] let p’ = {[z, A'] =œ v | ([z] > v’) € p} in 
in n (p’, v) 
[[@], act]* n [[a@, Ao], e=create(u)]* n = 
[[@, Ao], act” ]* n = let 7 [x] = n [x, Ao] in 
let 7’ [x] = n [x, Ao] in let ({[ti1] > v'}, v) = [lâ], e=create(ui)]# n’ in 
[lâ], act”’]* n (Ghi, A] v | A’ € newt, uur Ao}, v) 


Fig. 5: Right-hand sides for an observing action act, an observable action act’, a 
create action, and an action act” that is neither for the refined analyses, defined 
as wrappers around the right-hand sides of a base analysis. 


The corresponding constraints of the refined system with control-point splitting 
differ based on whether the action act is observing, observable, or neither. 


— When act is observing, the new right-hand side additionally gets the digest 
A, associated with the local traces that are to be incorporated: 


(n,n [6, A']) = [[G, Ao], act, Ai]! n for Ao, A; E A, A’ € [u, act)’, (Ao, A1) 


— When act is observable, the digest A’ of the resulting local trace is passed, 
so the side-effect can be redirected to the appropriate unknown: 


(n,n [ô, A']) 3 flå, Ao], act, A'J? n for Ag E A, A’ € [u, act]*, (Ao) 
— When act is neither, no additional digest is passed: 


(n,n [ô, A")  [[@, Ao], act] ’ n for Ag E A, A’ € [u, act]*, (Ao) 


The new right-hand sides are defined in terms of the right-hand side of the base 
analysis which are used as black boxes (Fig. 5). They act as wrappers, mapping 
any unknown consulted or side-effected to by the original analysis to the appro- 
priate unknown of the refined system. Thus, the refined analysis automatically 
benefits from the extra information the digests provide. It may, e.g., exploit that 
[u, act]? (Ao, A1) = Q meaning that, no local traces with digests Ap, Ai can be 
combined into a valid local trace ending with action act. The complete definition 
of the refined constraint system instantiated to the actions considered here and 
unknowns for program points enriched with locksets is given in [49, Fig. 14]. 

Enriching program points with locksets can in fact be seen as a first applica- 
tion of this framework. The right-hand sides are given in Fig. 6. 


Example 5. As a further instance, consider tracking which mutexes have been 
locked at least once in the local trace. At lock(a) traces in which a thread has 
performed a lock(a) can not be combined with traces that contain no lock(a). The 
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corresponding right-hand sides are given in Fig. 7. When refining the analysis 
from Section 4 accordingly (assuming a protects g and h), it succeeds in proving 
the assert in this program as the initial values of 0 for g and h can be excluded. 


main : ti? t2:: 

lock(a) ; x = create(t2); lock(a) ; 

h = 9; g = 10; lock(a) ; // ASSERT (h<=g) ; 
unlock (a); h = 11; g = 12; unlock(a) ; 

x = create(t1); unlock (a); 


This naturally generalizes to counting how often some action (e.g., a write to a 
global g) occurred, stopping exact bookkeeping at a constant (1 in this case). 


To prove soundness of local-trace-based refinement of our analysis from Sec- 
tion 4, we first construct a corresponding refined collecting local trace semantics. 
Then we verify that the refined analysis is sound w.r.t. this refined semantics — 
which, in turn, is proven sound w.r.t. the original collecting local trace semantics. 


Theorem 2. Assume that ay, new, and [u, act]*, fulfill requirements (4), (5), 
and (6). Then any solution of the refined constraint system is sound relative to 
the collecting local trace semantics. 


Proof. A proof sketch instantiated with the actions considered here and un- 
knowns enriched with locksets is provided in [49, Appendix D]. 


6 Analysis of Thread Ids and Uniqueness 


We instantiate the scheme from the previous section to compute abstract thread 
ids and their uniqueness. That refinement of the base analysis enhances precision 
of the analysis by excluding reads, e.g., from threads that have not yet been 
started. For that, we identify threads by their thread creation history, i.e., by 
sequences of create edges. As these sequences may grow arbitrarily, we collect all 
creates occurring after the first repetition into a set to obtain finite abstractions. 


Example 6. In the program from Fig. 8, the first thread created by main receives 
the abstract thread id (main - (wz, t1),@). It creates a thread with abstract thread 
id (main - (u1, ¢1) - (u3, t1), Ø). At program point u3, the latter creates a thread 
starting at tı and receiving the abstract thread id (main - (uy, t1), {(us, ti) }) — 
as do all threads subsequently created at this edge. 


init, = {0} [u, lock(a)]*, (9, 9°) = {S U {a} 
new", uu S = {0} [u, unlock(a)]}’, S = {5 \ {a}} 
[u, a]*,S = {S} (other non-observing) [u, alf (S,.S’) = {S} (other observing) 


Fig. 6: Right-hand sides for expressing locksets as a refinement. 


Clustered Relational Thread-Modular Abstract Interpretation 41 


Create edges, however, may also be repeatedly encountered within the creating 
thread, in a loop. To deal with this, we track for each thread, the set C of possibly 
already encountered create edges. As soon as a create edge is encountered again, 
the created thread receives a non-unique thread id. 


Example 7. The first time the main thread reaches program point uz in the 
program from Fig. 8, the created thread is assigned the unique abstract thread 
id (main - (uz, t1),0). In subsequent loop iterations, the created threads are no 
longer kept separate, and thus receive the non-unique id (main, { (ug, t1) }). 


Formally, let Nc,Ngs denote the subsets of program points with outgoing edge 
labeled x=create(...), and of starting points of threads, respectively. Let P C 
Ne x Ns denote sets of pairs relating thread creation nodes with the starting 
points of the created threads. The set ZË of abstract thread ids then consists of all 
pairs (i, s) € (main-P*) x 2” in which each pair (u, f} occurs at most once. Given 
the set ZË, we require that there is a concretization y : ZË + 27 and a function 
single : ZË > vi with yi# C yy» (single it). The abstract thread id of the main 
thread is given by (main, Ø). Therein, the elements in (main-P*) x {0} represent 
the unique thread ids representing at most one concrete thread id, while the 
elements (i, s), s 4 0, are ambiguous, i.e., may represent multiple concrete thread 
ids. Moreover, we maintain the understanding that the concretizations of distinct 
abstract thread ids from ZË all are disjoint. 

As refining information A we consider not only abstract thread ids — but 
additionally track sets of executed thread creations within the current thread. 
Accordingly, we set A = TË x 2? and define the right-hand sides as seen in Fig. 9, 
where 7 denotes the set of pairs occurring in the sequence i. 


Example 8. Consider again the program from Fig. 8 with right-hand sides from 
Fig. 9, and assume that the missing right-hand for join returns its first argument. 
The initial thread has the abstract thread id ig = (main, Ø). At its start point, 
the digest thus is (i9,0). At the create edge originating at ui, a new thread 
with id (main - (u1,t1),) is created. The digest for this thread then is ((main - 
(u1, t1), 0), 0). For the main thread, the encountered create edge (u1, t1) is added 
to the second component of the digest, making it (io, { (u1, #1) }). 

When u2 is reached with (to, {(u1,t1)}), a unique thread with id (main - 
(uz, t1), Ø) is created. The new digest of the creating thread then is (to, { (u1, t1), 
(uz, t1)}). In subsequent iterations of the loop, for which uz is reached with 
(io, {(u1, t1), (u2, t1)}), a non-unique thread with id (main, {(u2, ti) }) is created. 


init”, = {0} if acL A a¢L! 
new’, uu L= {L} ue U{a}} otherwise 
[u, a], S = {L} (other non-observing) [u, al’, (L, L’) = {LUL’} (other observing) 


Ju, lock(a yy ( (L,L')= 


Fig. 7: Right-hand sides for refining according to encountered lock operations. 
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When reaching u3 with id (main, { (u2, t1)}), a thread with id (main, { (ug, t1), 
(ug, t1)}) is created as the id of the creating thread was already not unique. When 
reaching it with the id (main - (u1,t1),0), a new thread with id (main - (ui, ¢1) - 
(u3, t1), Ø) is created. When the newly created thread reaches this program point, 
the threads created there have the non-unique id (main - (u1,t1), {(us,t1)}), as 
(ug,t1) already appears in the id of the creating thread. 


Abstract thread ids should provide us with functions 


— unique : Z#”—bool tells whether a thread id is unique. 

— Icu_anc : Z*+T*—T* returns the last common unique ancestor of two threads. 

— may_create : Z*+Z*—bool checks whether a thread may (transitively) cre- 
ate another. 


For our domain ZË, these can be defined as unique (i, s) = (s = Ø) and 


Icu_anc (i, s) (i, s’) = (longest common prefix i 7’, Ø) 
may _create (i, s) (t, s’) = (1U s) C (P Us’) 


We use this extra information to enhance the definitions of [u, lock(a)]*, and 
lu, a! =join(x)], to take into account that the ego thread cannot acquire a mutex 
from another thread or join a thread that has definitely not yet been created. 
This is the case for a thread t 


(1) that is directly created by the unique ego thread, but the ego thread has not 
yet reached the program point where t’ is created; 

(2) whose thread id indicates that a thread that has not yet been created ac- 
cording to (1), is part of the creation history of t’. 


Accordingly, we introduce the predicate may_ run (i, C) (i', C”) defined as 


(Ilcu_ancii =i) => A(u,u’) EC: (io(u,u’) = i' V may_create (io(u, u’)) i’) 
which is false whenever thread 7’ is definitely not yet started. We then set 


[u, lock(a)], (i, C) (i, C’) = [u, x’=join(x)]4, (4, C) (’, C’) 
{(4,C)} if may_ run (i, C) (2’, C") 
0 otherwise 


This analysis of thread ids and uniqueness can be considered as a May-Happen- 
In-Parallel (or, more precisely, Must-Not-Happen-In-Parallel) analysis. MHP 


main : ti: 
x = g; // PP ul g = 42; // PP u3 
y = create(t1); y = create(t1); 


for(i = 0; i < 5; i++) { // PP u2 
z = create(t1); } 


Fig. 8: Program with multiple thread creations. 
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information is useful in a variety of scenarios: a thread-modular analysis of data 
races or deadlocks, e.g., that does not consider thread ids and joining, can be 
refined with this analysis to exclude more data races or deadlocks. Subsequently, 
we outline how the analysis from Section 4 may benefit from MHP information. 


7 Exploiting Thread IDs to Improve Relational Analyses 


We subsequently exploit abstract thread ids and their uniqueness to limit the 
amount of reading performed by the analysis from Section 4. 


I1 from other threads that have not yet been created. 
I2 the ego thread’s past writes, if its thread id is unique. 
I3 past writes from threads that have already been joined. 


Improvements I1 and I3 have, e.g., been realized in a setting where thread ids 
and which thread is joined where can be read off from control-flow graphs [31]. 
Here, however, this information is computed during analysis. In our framework, 
I1 is already achieved by refining the base analysis according to Section 6. 


Example 9. Consider the program below where M[g] = {a,b, mg}, M[A] = 
{a, b, mn}, M[2] = {m;} and assume Qa = {{g,h}}. 


matin: oi: 

x = create(t1i); lock(a); lock(a); 

// ASSERT(g==h); (1) r=?;g=r;h=r; 
unlock (a); unlock(a) ; 

y = create(t2); lock(a); 

// ASSERT(g==h); (2) t2: 

g = 42; h = 42; lock(a); v = g; unlock(a); 


unlock(a); z = create(t3); 
i = 3; i = 2; // ASSERT(i==2); (3) t3: 
i = 8; lock(a); g = 19; unlock(a); 


The analysis succeeds in proving (1), as the thread (starting at) t3 that breaks the 
invariant g=h has definitely not been started yet at this program point. Without 
refinement, the analysis from Section 4 could not prove (1). However, this does 


init”, = {((main, Ø), 0)} (d,s) o (u, u) = 
[u, r=create(u:)] 4, (i, C) = {(i, C U {(u,u)})} Hd = (do - (u, u1)) : dı then 


[u, al (i, C) = {(i,C)} (for other actions a) Poo ae ai. 0) 
new, u u1 ((d, s), C) 


else (d, s U { (u, u1)}) 
let (d', s’) = (d,s) o (u, u1) in 


if s’ = 0A (u, u1) € C then ((d, {(u, u1)}), 0) 
else ((d', s^), 0) 


Fig. 9: Right-hand sides for thread ids. 
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not suffice to prove (2). At this program point, tg may already be started. At 
the lock(a) in t2, t3 may also be started; thus, the violation of the invariant g=h 
by t3 is incorporated into the local state of t2 at lock. At unlock(a), despite t2 
only reading g, the imprecise abstract relation violating g=h, is side-effected to 
la, {g, h}, t2] and is incorporated at the second lock(a) of the main thread. The 
final shortcoming is that each thread reads all its own past (and future!) writes 
— even when it is known to be unique. This means that (3) cannot be proven. 


To achieve I2, some effort is required as our analysis forgets values of globals 
when they become unprotected. This is in contrast, e.g., to [39, 42]. We thus 
restrict side-effecting to mutexes to cases where the ego thread has possibly 
written a protected global since acquiring it. This is in contrast to Section 4, 
where a side-effect is performed at every unlock, i.e., everything a thread reads 
is treated as if it was written by that thread. 

Technically, we locally track a map L : (M x Q) > R, where L (a, Q) main- 
tains for a mutex a, an abstract relation between the globals in cluster Q € Qa. 
More specifically, the abstract relation on the globals from Q recorded in L (a, Q) 
is the one that held when a was unlocked join-locally for the first time after the 
last join-local write to a global in G [a]. If there is no such unlock(a), the relation 
at program start is recorded. We call an operation in a local trace join-local to 
the ego thread, if it is (a) thread-local, i.e., performed by the ego thread, or (b) 
is executed by a thread that is (transitively) joined into the ego thread, or (c) is 
join-local to the parent thread at the node at which the ego thread is created. 
This notion will also be crucial for realizing I8. Join-locality is illustrated in 
Fig. 10, where the join-local part of a local trace is highlighted. 

For join-local contributions, it suffices to consult La instead of unknowns 
la, Q, i]. Such contributions are accounted for. To check whether a contribu- 
tion from some thread id is accounted for, we introduce a function acc : (A x 
Ds)—A—bool (see definition (7) below). Besides an abstract value from R, the 
local state Dg now contains two additional components: 


— The map L: (M x Q) > R for which the join is given component-wise; 

— The set W : 29 (ordered by C) of globals that may have been written since 
one of its protecting mutexes has been locked, and not all protecting mutexes 
have been unlocked since. 


Just like r, L and W are abstractions of the reaching local traces. Dr is also 
enhanced with an L component, while Day; remains unmodified. We sketch the 
right-hand sides here, definitions are given in Fig. 11. For program start init®, 
in contrast to the analysis from Section 4, there is no initial side-effect to the 
unknowns for mutexes. The initial values of globals are join-local, and thus ac- 
counted for in the L component also passed to any subsequently created thread. 

The right-hand sides for thread creation and return differ from the anal- 
ysis from Section 4 enhanced with thread ids only in the handling of additional 
data structures L and W. As the thread ids are tracked precisely in the A com- 
ponent, this information is directly used when determining which unknown to 
side-effect to and unknowns [(i, C)] replace unknowns [i’, (i, C)]. 
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For join, if the return value of the thread is not accounted for, it is assigned 
to the variable on the left-hand side and the L information from the ego thread 
and the joined thread is joined. If, on the other hand, it is accounted for, the 
thread has already been joined and cannot be joined again. There is a separate 
constraint for each (7’, C”), so that all threads that could be joined are considered. 

For locking of mutexes, upon lock, if (i’,C’) is not accounted for, its infor- 
mation on the globals protected by a is joined with the join-local information 
for a maintained in L (a, Q), Q € Qa. This information about the globals pro- 
tected by a is then incorporated into the local state by M. For unlocking of 
mutexes, if there may have been a write to a protected global since the mutex 
was locked (according to W), the join-local information is updated and the local 
state restricted to Q is side-effected to the appropriate unknown fa, Q, (i, C)] for 
Q E€ Qa. Just like in Section 4, r is then restricted to only maintain relation- 
ships between locals and those globals for which at least one protecting mutex 
is still held. Reading from and writing to globals once more are purely local 
operations. To exclude self writes, we set 


acc ((i, C), _) (t, C") = uniquei A i = 7’ (7) 


The resulting analysis thus takes I1 (via [--]', defined in Section 6), as well as 
I2 (via acc) into account. In Example 9, it is now able to show all assertions. 
Theorem 3. This analysis is sound w.r.t. to the local trace semantics. 


Proof. The proof relies on the following observations: 


— When Gla] NW = 9, no side-effect is required. 
— Exclusions based on acc are sound, i.e., it only excludes join-local writes. 


The detailed proof is a simplification of a proof for an enhanced analysis from the 
extended version [49, Appendix F], which we outline in Appendix G there. 


The analysis does not make use of components C at unknowns fa, Q, (i, C)] and 
[i,C]. In [49, Appendix E], we detail how this information can be exploited 
to exclude a further class of writes — namely, those that are performed by an 
ancestor of the ego thread before the ego thread was created. Alternatively, an 
implementation may abandon control-point splitting according to C at mutexes 
and thread ids, replacing [a, Q, (i, C)], [i, C] with [a, Q, i] and [i], respectively. 


x=create(t2) y=create(ti) — lock(m,) g=1 _unlock(m,) 
CO 


—mg 


eee, z=join(x)  lock(mg) g=2 


oar) 


return z 


Fig. 10: Illustration highlighting the join-local part of a local trace of the program 
from Fig. 2a, and which writes are thus accounted for by L. 
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init o) = [[z, S, (i, C)], e=create(u1)J'n = 

let L (a, Q) = [{g — 0 | g € Q}HÌT in let (L, W, r) = n [u, S, (i, C')] in 

let r = [self +” ilk T in let (i’,C’) = new), wus (i, C) in 

(0, ({(a,Q) > L (a, Q) | a€M, QE Qa} ,0,r))let r! = (self —* (single i’)]kr)| in 
[[u, 6, (i, C), od = join(z), (i; C')]'n = let p={[us, (0, (i, Cc) (L, 0, r')} in 


let (L, W, r) = n[u, S, (i, C)] in (p, (L, W, [x +" single i'J$ r)) 

if (single i’ N ((unliftr) z)=1) then CR (i, C)], return z, Cen) Lee 
A $ +) 1 193 ; 1S, n= 

L elseif acc ((i, C), (L, W, r)) (t, C”) let (L, W, r) = n u, S, (i, ©) in 


then L else \ L t , 
let (L',v) = n[(i’,C’)] in et v = ( ret <— akr) al in 
let r’ = fa’ + (unlift v) ret]ġ r in let p = {[(i, C)] > (L, v)} in 
(0,(LUL’,W,r')) (p, (L, W,r)) 
Ilu, S, (i, C)], lock(a), (4’,C’)] #7 = Ilu, S, (i, C)], unlock(a), (i, C)]J'n = 
let (L, W, r) = n[u, S, (i, C)] in let (L, W, r) = ņ [u, S, (i, C)] in 
let r’ = if acc (G, C), (L, W,r)) (2, 0”) let (L’, p) = if Gla|NW=0 then (L, Ø) 
then L else[]oc9, 7 la, Q, (v,C’)] in else (L © {(a,Q) > rlo |Q € Qa}, 
(o, (Lwen (Moco, £(@.Q)) Ur’) _ {la GC)] 4 rg Q€ ea) 
[[u, S, (i, C)l,g = a]*n = ioe y= r| in 
let (L,W,r) = n |u, S, A] in pee le ete a) 


0,(L,WU toh lo = slk r) eo ee 
[[u, S, (i, C)], £ = gļ’n _ 

let (L,W,r) = 7 [u, S, A] in 

(0, (L, W, [x — glk r)) 


Fig. 11: Right-hand sides for the improved (I1, I2) analysis using thread ids. 


When turning to improvement I3, we observe that after joining a thread t 
with a unique thread id, t cannot perform further writes. As all writes of joined 
threads are join-local to the ego thread, it is not necessary to read from the 
corresponding global unknowns. We therefore enhance the analysis to also track 
in the local state, the set J of thread ids for which join has definitely been called 
in the join-local part of the local trace and refine acc to take J into account: 


acc ((i, C), (J, L, W,7r)) (t, C’) = unique}? A (i =i’ Vi’ €J) 


The extended version [49, Appendix F] gives details on this enhancement. 


8 Exploiting Clustered Relational Domains 


Naively, one might assume that tracking relations among a larger set of globals 
is necessarily more precise than between smaller sets. Interestingly, this is no 
longer true for our analyses, e.g., in presence of thread ids. A similar effect 
where relating more globals can deteriorate precision has also been observed in 
the context of an analysis using a data-flow graph to model interferences [19]. 


Clustered Relational Thread-Modular Abstract Interpretation 47 


Example 10. Consider again Example 1 in the introduction with Q, = {{g, h, i}}. 
For this program, the constraint system of the analysis has a unique least solu- 
tion. It verifies that assertion (1) holds. It assures for [a,{g, h,i}, tı] that h=i 
holds, while for the main thread and the program point before each assertion, 
L (a, {g,h,t}) = {g=h, h=i} holds, while for [a, {g, h, i}, main] and Ja, {g, h, i}, ta] 
only T is recorded, as is for any relation associated with Mmg, mp, or Mi. Asser- 
tion (2), however, will not succeed, as the side-effect from tı causes the older 
values from the first write in the main thread to be propagated to the assertions 
as well, implying that while h=7 is proven, g=h is not. 


Intuitively, the analysis loses precision because, at an unlock of mutex a, the 
current relationships between all clusters protected by a are side-effected. As 
soon as one global is written to, the analysis behaves as if all protected globals 
had been written. By limiting publishing to those clusters for which at least one 
global has been written, more precise information may remain at others. 

In the improved analysis, when unlocking a mutex a, side-effects are only 
produced to clusters Q € Qa containing at least one global that was written to 
since the last lock(a). Definitions for locking and unlocking are given in Fig. 12. 

For locking the mutex a, the abstract value to be incorporated into the local 
state is assembled from the contributions of different threads to the clusters. 
For that, the separate constraints for each admitted digest from Section 5 are 
combined into one for the set I = {(i', C") | (i, C) € [lock(a)]}*, ((é, C), (i, C’))} 
of all admitted digests. This is necessary as side-effects to unaffected clusters at 
unlock(a) have been abandoned and thus the meet with the values for clusters of 
one thread at a time is unsound. For each Q, the join-local information L (a, Q) 
is joined with all contributions to Q by threads that are not yet accounted for, 
but admitted for Q by the digests. Here, the contributions of threads that do not 
write Q is L, and thus do not affect the value for Q. Finally, the resulting value 
is used to improve the local state by meet. The right-hand side for lock(a) thus 
exploits the fine-grained, per-cluster MHP information provided by the digests 
and the predicate acc. We obtain: 


Theorem 4. Given domains R and V? fulfilling the requirements from Fig. 1, 
any solution of the constraint system is sound w.r.t. the local trace semantics. 
Maximum precision is obtained with Qa = gla], 


For Example 1, with Qa = 29!¢], both assertions are verified. Performing the 
analysis with all subclusters simultaneously can be expensive when sets G[a] are 
large. The choice of subclustering thus generally involves a trade-off between 
precision and runtime. This is different for k-decomposable relational domains: 


Theorem 5. Provided the relational domain is k-decomposable (Equation (2)), 
the clustered analysis using all subclusters of sizes at most k only, is equally 
precise as the clustered analysis using all subclusters Qa = 291°] at mutezes a. 


Proof. Consider a solution 7 of the constraint system with Qa = 2910]. Then for 
unknowns fa, Q, (i,C)] and [a, Q’, (i,C)] with Q C Q’ and |Q| < k, and values 
r=n [a, Q, (i, C)], r'=n [a, Q’, (i, C)], we have that r E lo (whenever the smaller 
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[lu, S, (i, C)], unlock(a), (i, C)]?n = [[u, S, (i, C)], lock(a), TJ#n = 
let (L,W,r) = 7 [u, S, (i, C)] in let (L,W,r) = 7 [u, S, (i, C)] in 
let O' = {Q| Q E€ Qa, QOW FO} in let | = ((i,C), (L, W,r)) in 
let L’ = L {(a,Q) > rlo |Q € Q'} in let J(Q) = Ll {n la, Q, (¢, C°)] | 
et p= {[a, Q, (i, C) > rig |Q € 2} in (i ae C’)} in 
let r’ = = r| xu {Gla']Ja’e(S\a)} 19 let r’ =[ loco, (J(Q) U L (a, Q)) 
let W’ = {W | g € W, M[g] N S \ {a} AO} in 
(p, (L',W',r”)) (0, (L,W,r Nr’) 


Fig. 12: Right-hand sides for unlocking and locking when limiting side-effecting 


to 


potentially written clusters. 


cluster receives a side-effect, so does the larger one). Thus, by k-decomposability, 


the additional larger clusters Q’, do not improve the meet over the clusters of 
size at most k for individual thread ids as well as the meet of their joins over all 
thread ids. The same also applies to the clustered information stored in L. 


Example 11. Consider again Example 1. If the analysis is performed with clusters 


a 


= {{h, i}, {g, h}, {9,7}, {g}, {i}, {h}} both assertions can be proven. 


The one element clusters, on the other hand, cannot be abandoned - as indicated 


by 


9 


the example from Appendix H in the extended version [49]. 


Experimental Evaluation 


We implemented [50] the analyses extending the context-sensitive static ana- 


lyz 


er GOBLINT which provides the set of protecting mutexes for each global. 


The implementation tracks information about integral variables using either the 
Interval or the Octagon domains from APRON [29]. A comparison with other 
tools is difficult, for details see [49, Appendix I]: 


Duet [19] — Its benchmarks are only available as binary goto-programs 
which neither its current version nor any other tool considered here can 
consume. Since DUET does not support function calls, it could only be run 
on some of the benchmarks considered here. 

ASTREEA [39] — A public version is available but not licensed for evaluation. 
WATTS [31] — Since we were unable to run the tool on any program, we 
compared with the numbers reported by the authors. 

NR-GOBLINT [48] — GOBLINT with the non-relational analyses from [48]. 


We considered four different configurations, namely, Interval: the analysis from 
Section 4 with Intervals; Octagon: the same analysis with Octagons; TIDs: the 
analysis from Section 7 with enhancement [49, Appendix F] with Octagons; 
Clusters: TIDs using clusters of size at most 2 only. All benchmarks were run in 
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Table 2: Summary of evaluation results, with individual programs grouped to- 

gether. For each group the number of programs and the total number of asser- 

tions are given. ¥ (X) indicates that all (no) assertions are proven, otherwise the 

number of proven assertions is given. (—) indicates invalid results produced. 
Our analyzer 


Interval Octagon TIDs Clusters © NR-GoBLINT 


Set Group A (Sec. 4) (Sec. 4) (Sec. 7) (Sec. 8) w/ interval Dunt 
Our Basic 3 4 v v v v v 3 
Relational 10 35 x v v v 4 2 
TID 12 19 x x v v x 2 
Cluster 2 3 x x 1 v x 1 
GoBLINT POSIX 5 1679 1146 1490 v y 1582 — 
SV-COMP 7 360 v v v v v — 
WATTS Created 3 3 2 2 2 2 2 X 
SV-COMP 5 5 1 1 1 1 1 x 
LKMPG 1 2 x x x x x x 
DDVERIFY 28 1071 1043 1043 v "A 1043 — 
Scalability 5 740 735 735 v v 735 — 
RATCOP 19 34 4 14 18 18 6 4 


a virtual machine on an AMD EPYC 7742 64-Core processor? running Ubuntu 
20.04. The results of our evaluation are summarized in Table 2. 


Our benchmarks. To capture particular challenges for multi-threaded relational 
analysis, we collected a set of small benchmarks (including the examples from this 
paper) and added assertions. On these, we evaluated our analyzer, NR-GOBLINT, 
and DUET. Our analysis in the Clusters configuration is capable of verifying all 
the programs. The other tools could only prove a handful of relational assertions. 


GOBLINT benchmarks [48]. These benchmarks do not contain assertions. To still 
relate the precision of our analyzer to the non-relational NR-GOBLINT and to 
DUET, we used our tool in the Clusters setting to automatically derive invariants 
at each locking operation. Perhaps surprisingly, NR-GOBLINT could verify 95% 
of the invariants despite being non-relational and not using thread ids. 


Watts benchmarks [31]. These programs were instrumented with asserts and 
significantly changed by the authors. Our analyses can verify all but 7 out of 
over 1000 assertions. Due to necessary fixes to programs and our inability to run 
their tool, numbers are not directly comparable. Nevertheless, for their scalability 
tests, reported runtimes for WATTS are up to two orders of magnitude worse than 
ours. See [49, Appendix I] for a more detailed discussion. 


5 The analyzer is single-threaded, so it only used one (virtual) core per analysis job. 
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Name  LLoC oe TIPSE [ 0 Interval 1 0 Octagon [0 TIDs f E Clusters 
(unique) Octagon 

pfscan 550 3(2 19.0% 50C J 
aget 581 6(4 0.0% 8 _| 

ctrace 651 3 (3 0.0% n 

knot 973 9(5 00% = y 7 
smtpre 3013 2 (2 0.8% g 6b = 

iowarrior 1358 4 (4 17.1% on 

adutux 1509 4 (4 0.0% ia 4j- = 

w83977af 1515 6(4) 121% č ž 20 F 

tegra20 1560 7(5 0.0% i 9 r 

nsc 2394 11 (7 32.2% 10 

marvelll 2476 6 (5 59.5% l 

marvell2 2476 6 (5 58.4% o™ a 0 

SF @ Lik skyd © NANG 

(a) Number of discovered thread ve PPS SST L FOS 
ids and proportion of program eo Gi D i N SS 
points where analysis with thread 

ids is more precise. (b) Analysis times. 


Fig. 13: Precision and performance evaluation on the GOBLINT benchmark set. 


RATCOP benchmarks [42]. These were JAVA programs. After manual transla- 
tion to C, our analyzer succeeded in proving all assertions any configuration of 
RATCOP could with Octagons, while RATCOP required polyhedra in one case. 


Internal comparison We evaluated our analyses in more detail on the GOBLINT 
benchmark set [48]. Fig. 13a shows sizes of the programs (in Logical LoC) and the 
number of thread ids found by the analysis from Section 6. The high number of 
threads identified as unique is encouraging. To evaluate precision, we compared 
the abstract values at each program point (joined over contexts). Fig. 13a shows 
for what proportion of program points tracking thread ids increases precision. 
There were no program points where precision decreased or values became in- 
comparable, while for some programs gains of over 50% were observed. Fig. 13b 
illustrates runtimes. In 9 of 12 cases, performance differences between our re- 
lational analyses are negligible. In all cases, using clusters incurs no additional 
cost. Thus, the more precise analysis with clusters of size < 2 seems to be the 
method of choice for thread-modular relational abstract interpretation. 


10 Related Work 


Since its introduction by Miné [36, 37], the weakly relational numerical domain 
of Octagons has found wide-spread application for the analysis and verification 
of programs |8, 14]. Since tracking relations between all variables may be expen- 
sive, pre-analyses have been suggested to identify clusters of numerical variables 
whose relationships may be of interest [8, 14, 26, 45]. A dynamic approach to 
decompose relational domains into non-overlapping clusters based on learning 
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is proposed by Singh et al. [55]. While these approaches trade (unnecessary) 
precision for efficiency, others try to partition the variables into clusters without 
compromising precision [15, 23, 24, 44, 54, 56]. These types of clustering are 
orthogonal to our approach and could, perhaps, be combined with it. 


The integration of relational domains into thread-modular abstract inter- 
pretation was pioneered by Miné [39]. His analysis is based on lock invariants 
determining for each mutex a relation which holds whenever the mutex is not 
held. Weak interferences are used to account for asynchronous variable accesses. 
For practical analyses, a relational abstraction only for lock invariants is pro- 
posed, while using a coarse, non-relational abstraction for the weak interfer- 
ences. This framework closely follows the framework for non-relational analysis 
[38], while abandoning background locksets. Our relational analysis, on the other 
hand, maintains at each mutex a only relations between variables write-protected 
by a. For these relations more precise results can be obtained, since they are in- 
corporated into the local state at locks by meet (while [39] uses join). 


Monat and Miné [40] present an analysis framework which is orthogonal to 
our approach. It is tailored to the verification of algorithms that do not rely on 
explicit synchronization via mutexes such as the Bakery algorithm. Suzanne and 
Miné [57] extend [40] to handle weak memory effects (PSO, TSO) by incorporat- 
ing memory buffers into the thread-local semantics. The notion of interferences 
is also used by Sharma and Sharma [52] for the analysis of programs under the 
Release/ Acquire Memory Model of C11 by additionally tracking abstractions 
of modification sequences for global variables. They consider fixed finite sets of 
threads only, and do not deal with thread creation or joining. 


Earlier works on thread-modular relational analysis rely on DATALOG rules 
to model interferences in the sense of Miné in combination with abstract inter- 
pretation applied to the Data-Flow Graph [19] or the Control-Flow Graph [31] 
(later extended to weak memory [32]), respectively. Botbol et al. [10] give a 
non-thread-modular analysis of multi-threaded programs with message-passing 
concurrency by encoding the program semantics as a symbolic transducer. 


In all these approaches clusters of variables, if there are any, are predefined 
and not treated specially by the analysis. This is different in the thread-modular 
analysis proposed by Mukherjee et al. [42]. It propagates information from un- 
locks to locks. It is relational for the locals of each thread, and within disjoint 
subsets of globals, called regions. These regions must be determined beforehand 
and must satisfy region-race freedom. In contrast, the only extra a priori infor- 
mation required by our analysis, are the sets of (write-) protecting mutexes of 
globals — which can be computed during the analysis itself. The closest concept 
within our approach to a region is the set of globals jointly protected by mutexes. 
These sets may overlap — which the analysis explicitly exploits. Like ours, their 
proof of correctness refers to a thread-local semantics. Unlike ours, it is based 
on interleavings and thus overly detailed. The concrete semantics on which our 
analyses are based, is a collecting local trace semantics extending the semantics 
of Schwarz et al. [48] by additionally taking thread termination and joins into 
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account. The analyses in [48], however, are non-relational. No refinement via 
further finite abstractions of local traces, such as thread ids is provided. 


The thread id analysis perhaps most closely related to ours, is by Feret 
[20] who computes ids for agents in the z-calculus as abstractions of sequences 
of encountered create edges. Another line of analysis of concurrent programs 
deals with determining which critical events may happen in parallel (MHP) |1- 
4, 7, 17, 43, 59] to detect programming errors like, e.g., data races, or identifying 
opportunities for optimization. Mostly, MHP analyses are obtained as abstrac- 
tions of a global trace semantics [18]. We apply related techniques for improving 
thread-modular analyses — but based on a local trace semantics. Like MHP anal- 
yses, we take thread creation and joining histories as well as sets of held mutexes 
into account. Additionally, we also consider crucial aspects of the modification 
history of globals and provide a general framework for further refinements. 


In a sequential setting, splitting control locations according to some abstrac- 
tion of reaching traces is a common technique for improving the precision of 
dataflow analyses [9, 27] or abstract interpretation [25, 34, 41, 47]. Control point 
splitting can be understood as an instance of the reduced cardinal power do- 
main [12, 13, 22]. For the analysis of multi-threaded programs, Miné [39] applies 
the techniques of Mauborgne and Rival [34] to single threads, i.e., independently 
of the actions of all other threads. Our approach, on the other hand, may take 
arbitrary properties of local traces into account, and thus is more general. 


11 Conclusion and Future Work 


We have presented thread-modular relational analyses of global variables tailored 
to decomposable domains. In some cases, more precise results can be obtained by 
considering smaller clusters. For k-decomposable domains, however, we proved 
that the optimal result can already be obtained by considering clusters of size 
at most k. We have provided a framework to incorporate finite abstractions of 
local traces into the analysis. Here, we have applied this framework to take cre- 
ation as well as joining of threads into account, but believe that it paves the way 
to seamlessly enhance the precision of thread-modular abstract interpretation. 
The evaluation of our analyses on benchmarks proposed in the literature indi- 
cates that our implementation is competitive both w.r.t. precision and efficiency. 
In future work, we would like to experiment with further abstractions of local 
traces, perhaps tailored to particular programming idioms, and also explore the 
potential of non-numerical 2-decomposable domains. 
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Abstract. Many program analysis tools and techniques have been de- 
veloped to assess program vulnerability. Yet, they are based on the stan- 
dard concept of reachability and represent an attacker able to craft smart 
legitimate input, while in practice attackers can be much more powerful, 
using for instance micro-architectural exploits or fault injection methods. 
We introduce adversarial reachability, a framework allowing to reason 
about such advanced attackers and check whether a system is vulnera- 
ble or immune to a particular attacker. As equipping the attacker with 
new capacities significantly increases the state space of the program un- 
der analysis, we present a new symbolic exploration algorithm, namely 
adversarial symbolic execution, injecting faults in a forkless manner to 
prevent path explosion, together with optimizations dedicated to reduce 
the number of injections to consider while keeping the same attacker 
power. Experiments on representative benchmarks from fault injection 
show that our method significantly reduces the number of adversarial 
paths to explore, allowing to scale up to 10 faults where prior work 
timeout for 3 faults. In addition, we analyze the well-tested WooKey 
bootloader, and demonstrate the ability of our analysis to find attacks 
and evaluate countermeasures in real-life security scenarios. We were es- 
pecially able to find an attack not mentioned in a previous patch. 


Keywords: Program analysis - Attacker model - Fault injection - Sym- 
bolic execution 


1 Introduction 


Context. Major works have delved into program analysis over the last decades, 
leveraging techniques such as symbolic execution [24,53,18], static analysis [43], 
abstract interpretation [30] or bounded model checking [29], to hunt for software 
vulnerabilities and bugs in programs, or to prove their absence [35,60], leading 
to industrial adoption in some leading companies [18,43,6,60,66]. As bugs are an 
attack entry point, removing them is a first step towards better software security. 


* Partially supported by grants ANR TAVA, PEPR Secureval and Carnot Flexsecurity. 
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Problem. Yet, stepping back from these successes, it appears that all these 
methods consider a rather weak threat model, where the attacker can only craft 
smart “inputs of death” through legitimate input sources of the program, ex- 
ploiting corner cases in the code itself. Tools only looking for bugs and software 
vulnerabilities may deem a program secure while the bar remains quite low for 
an advanced attacker, able for example to take advantage of attack vectors such 
as (physical) hardware fault injections [58], micro-architectural attacks [61,70], 
software-based hardware attacks [86,55,69] like Rowhammer [70], or any com- 
bination of vectors [63]. While previously limited to high-security devices and 
systems such as smart cards and cryptography modules [16,13], these fault-based 
attacks can now target a wider spectrum of systems, such as bootloaders [57], 
firmware update modules [19], security enclaves [69], etc. The reasoning behind 
automated software-implemented fault injection also applies to Man-At-The-End 
attacks [3] and is similar to the (manual) reasoning performed in control-flow 
integrity to evaluate countermeasures [1,21]. 


Goal & Challenges. Our goal is to devise a technique to automatically and 
efficiently reason about the impact of an advanced attacker onto program se- 
curity properties, where the standard reachability framework only supports an 
attacker crafting smart legitimate inputs. The first challenge is to provide a for- 
mal framework to study what an advanced attacker can do to attack a program. 
Interestingly, while such frameworks are routinely used in cryptographic pro- 
tocol verification [26,7], none has been studied for program-level analysis. The 
second challenge is to design an efficient algorithm to assess the vulnerability of 
a program to a given attacker model, while adding capabilities to the attacker 
naturally gives rise to a significant path explosion — especially in the case of 
multiple fault analysis. 

The rare prior works in the field, mostly focused on encompassing phys- 
ical fault injections for high-security devices, rely mostly on mutant genera- 
tion [28,79,49,25,50| or forking analysis [76,15,20,63], yielding scalability issues. 
Moreover, most of them are limited to a few predefined fault models and do not 
propose any formalization of the underlying problem. 


Proposal. We propose adversarial reachability, a formalism extending standard 
reachability to reason about a program execution in the presence of an advanced 
attacker, and we build a new algorithm based on symbolic techniques, named 
adversarial symbolic execution, to address the adversarial reachability problem 
from the bug finding point of view (bounded verification). Our algorithm pre- 
vents path explosion thanks to a new forkless encoding of faults. We show it is 
correct and k-complete with respect to adversarial reachability. To improve the 
performance further, we design two new optimizations to reduce the number of 
injected faults: Early Detection of fault Saturation and Injection On Demand. 


Contributions. As a summary, we claim the following novelties: 
— We formalize the adversarial reachability problem (Section 4), extending 
standard reachability to take into account an advanced attacker, together 

with the associated correctness and completeness definitions; 
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— We describe a new symbolic exploration method (Section 5), adversarial sym- 
bolic execution, to answer adversarial reachability, featuring a novel forkless 
fault encoding to prevent path explosion and two optimization strategies to 
reduce fault injection. We establish their correctness and completeness; 

— We propose an implementation of our techniques for binary-level analysis 
(Section 6), on top of the BINSEC framework [38]. We systematically evalu- 
ate its performances against prior work (Section 7), using a standard SWiFI 
benchmark from physical fault attacks and smart cards. Experiments show 
a very significant performance gain against prior approaches, for example 
up to x10 and x215 times on average for 1 and 2 faults respectively — with 
a similar reduction in the number of adversarial paths. Moreover, our ap- 
proach scales up to 10 faults whereas the state-of-the-art starts to timeout 
for 3 faults ; 

— We finally perform a security analysis of the WooKey bootloader ! (Section 
8), a very well tested real-life security-focused program. We were able to find 
known attacks and evaluate the adequacy of some of the countermeasures. 
Especially, we found an attack not taken into account in a recently proposed 
patch [63], and proposed a new patch to the developers. 


This work is a first step in designing efficient program analysis techniques able to 
take into account advanced attackers. The approach is generic enough to accom- 
modate many common fault models, including the bit flip from RowHammer, 
test inversion or arbitrary data modification; still, instruction skips or modifica- 
tions are currently out of reach. Also, while we investigate the bug finding side 
of the problem (underapproximation), the verification side (overapproximation) 
is interesting as well. These are exciting directions for future research. 


Our dataset and benchmark infrastructure are made available through arti- 
fact? for reproducibility purpose, and the code is open-sourced?. 


2 Motivation 


We start by motivating the need for adversarial reachability, first with a descrip- 
tion of several realistic attack scenarios on software involving advanced attackers 
(Section 2.1), second with a small example showing the need for dedicated anal- 
ysis (Section 2.2). 


2.1 Fault Injection across Security Fields 


We describe hereafter several real software-level security scenarios where the 
attacker goes beyond crafting legitimate input to abuse the system under at- 


1 WookKey [14,89] is a secure USB mass storage device developed by the French Na- 
tional Security Agency, and has recently served as a recent challenge among French 
security evaluators. 

? DOI: 10.5281 /zenodo.7507112 
https: //zenodo. org/record/7507112#.Y7cLsK£MJhE 

3 https: //github.com/binsec/binsec-ase 
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tack. Interestingly, while these scenarios were historically focused on hardware- 
hardened high-security systems (such as smart cards) and associated with com- 
plex physical attack means, many recent scenarios do involve software-only at- 
tacks on standard systems, with targets encompassing cryptographic libraries, 
bootloaders, firmware updaters, security enclaves, etc. 


Hardware Fault Injection Attacks [58] cause erroneous computations by 
disturbing signal propagation in the chip with physical means such as electro- 
magnetic pulses [39], laser beams [85,4], or power [19] and clock glitches. The 
associated fault models include bit-, byte- or word- set and reset, bit-flips, in- 
structions corruption and instruction skips. State-of-the-art attacks involve mul- 
tiple fault injections [59], as expected by the high level of attack potential in 
Common Criteria vulnerability analysis. 


Software-implemented Hardware Attacks push the hardware into unstable 
states using software controlled mechanisms, like delays in memory buses induc- 
ing bit-flips in data fetched from memory [55] or CPU voltage and frequency 
manipulations yielding bit-flips in the processor [86,69]. The notorious Rowham- 
mer attack [70] abuses memory accesses to induce bit-flips in flash memory. 


Micro-architectural Attacks use micro-architectural behaviors in unexpected 
ways. For example: Spectre (version v1) [62] exploits branch predictors in spec- 
ulative executions, which can be seen as a test inversion followed by a rollback; 
Load Value Injection [87] injects arbitrary data into transient execution; race 
attacks [54] corrupt data of other running processes and can be seen as arbitrary 
data faults. 


Man-At-The-End Attacks considers an attacker having full observability and 
control over a software code and its execution [3], with the goal to steal sensitive 
data or code (reverse engineering attacks). The associated attacker model is 
hence very powerful, with capabilities such as halting and modifying data and 
code at any point of the execution. 


CFI Reasoning In order to assess the power of Control-Flow Integrity (CFI) 
mechanisms, researchers [1,21] define hypothetical attackers by their capabilities, 
such as “write anything anywhere’ or “write anything somewhere”, and manually 
prove that their countermeasure is indeed able to thwart such an opponent. 
While not per se an applicative security scenario, the techniques developed in 
this paper could help automate such essential reasoning. 


2.2 Motivating Example 


The motivating example in Figure 1 is a simple unrolled program inspired by 
the VerifyPIN benchmark [42], from the domain of hardware fault injection and 
smart cards. The user PIN digits ul to u4 are checked against the reference digits 
refl to ref4, using the accumulator res. The attacker seeks to be authenticated 
(validate the assert 1.16) without knowing the right digits (1.14). 
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1 bool g_ authenticated ; 

2 int ul, u2, u3, u4, refl, ref2, ref3, ref4; 

3 

4 void verifyPIN() { 

5 int res = 1; 

6 res = res * (ul = refl); 

7 res = res * (u2 == ref2); 

8 res = res * (u3 == ref3); 

9 res = res * (u4 == ref4); 

10 g authenticated = res; 

11} 

12 

13 void main(int argc, char const x*argv[]) { 

14 assert(ul!=refl || u2!=ref2 || u3!=ref3 || u4!=ref4); 
15 verifyPIN (); 

16 assert(g authenticated == true); /*x Security oracle */ 
17 } 


Fig. 1: Motivating example, inspired by VerifyPIN [42] 


Here, the attacker indeed cannot succeed by only crafting legitimate inputs. 
However, an advanced attacker can leverage more powerful attack vectors to 
inject faults into the program in order to succeed. For instance, corrupting 
g_ authenticated to true at 1.10 achieves the attacker goal. It could be obtained 
for example through a physical- or Rowhammer- attack. 


Program Analysis As expected, standard symbolic execution tools such as 
Klee [22], angr [84] or BINSEC [38] do not report any violation here, as they 
consider the simplest possible attacker. We can try to use SWiFI techniques 
[76,15,20,63] (detailed in Section 3.1) from high-security system evaluation. Yet, 
the standard forking approach does not scale with multiple faults: here, 166 
paths are explored in 0.6 seconds for 1 fault, 2994 paths in 11 seconds for 2 
faults, and it keeps on adding a factor x10 in explored paths and analysis time 
for each extra fault, until the analysis timeouts (12 hours) above 4 faults. On the 
contrary, our forkless algorithm presented in Section 5 simulates fault injection 
without creating new paths and, in this example, shows a constant runtime as 
the number of faults increases from 1 to 10 — we explore 9 paths in 0.2 seconds 
in all cases. 


3 Background 


We provide in this section background information on software-implemented 
fault injection, standard reachability and symbolic execution. 
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3.1 Software-implemented Fault Injection (SWiFT) 


SWIFT tools [28,76,79,15,49,25,20,50,63,68] have been developed in the commu- 
nity of high-secure systems to ease hardware fault injection campaigns, which 
are time consuming and require special equipment. SWiFI evaluates a program 
with the transformations induced by the effects of hardware faults, in order to 
find interesting attack paths. We distinguish two main SWIiFI techniques. 

First, the Mutant generation approach [28,79,49,25,50] consists in analyzing 
slightly modified versions of the program (named mutants), each of them embed- 
ding a different faulty instruction. Each mutant is then analyzed on its own. The 
main limitation of mutant generation is the explosion of mutants, in particular 
for multiple faults. Also, as the different mutants differ only slightly, analyzing 
each of them separately wastes lots of time repeating similar reasoning. 


if (fault_ here) 
then x := fault value 
xr=yt+az else x := y +z 
(a) Original statement (b) Forking transformation 


Fig. 2: Forking code transformation in pseudo-code 


Second, the forking approach |76,15,20,63] consists in instrumenting the anal- 

ysis (or the code, via instrumentation) to add all possible faults as forking points 
(branches) controlled by boolean values indicating whether a particular fault will 
be taken or not, plus constraints on the maximal number of faults allowed. A 
forking data fault is illustrated in Figure 2. A standard program analysis tech- 
nique is then launched — typically symbolic execution or bounded model check- 
ing. Compared with mutant generation, this method allows sharing the analysis 
between the different possible faults. Still, the number of paths explodes with 
the number of possible faults (forking points). 
Scalability Issues These two approaches yield an explosion of the whole search 
space w.r.t. the number of fault injection points in the program: the mutant 
approach leads to consider up to C? (k among n)* mutants for a program under 
analysis with n possible fault locations and k faults, while the forking approach 
yields up to Cg paths to analyzed for a single original program path with n 
possible fault locations and k faults. 

In the following, we will consider the forking approach as the baseline — please 
keep in mind that the mutant approach scales worse. 

Fault Models Supported fault models vary for each tool, but they are usually 
adapted from hardware fault models [47,82]. The most common fault models are 
(1) data faults such as arbitrary data modifications, set and reset of bytes, words 
or variables, bit-flips; and (2) instruction corruptions such as instruction skips 


t Remind that Cf = ($) = pew 
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and test inversions. Most tools are limited to one (sometimes two) hard-coded 
fault models. Only few SWiFI tools can handle multiple faults [88,76,63,68] — 
still with scalability issues. 


3.2 Standard Reachability Formalization 


Considering a program P, we denote S the set of all possible states of P. A state 
is composed of the code memory, the data memory (i.e. the stack and heap), 
the state of registers and the location of the next instruction to execute. The 
set of input states of a program P is noted So C S. The set of transitions (or 
instructions) of the program is denoted T. The execution of an instruction t is 
represented by a one-step transition relation >,€ S x S. We denote s > s’ when 
s —>, 8’ for some t € T. We extend the transition relation over any finite path 
a E€ T* through composition. The transitive reflexive closure of — is noted >%*. 
Finally, we use S — s’ as a shortcut for ds € S.s > s’, and ><, for reachability 
in at most k steps. 


We consider in the rest of the paper the case of location reachability: given 
a location l (instruction or code address) of the program under analysis, the 
question is whether we can reach any state s at location l. More formally, L is 
the finite set of locations of P, and we consider a mapping loc: S œ> L from 
states to locations. For example, loc may return the program counter value. We 
write S —>* l as a shortcut for ds’ € S.S >* s’ A loc(s’) = 1. 


Definition 1 (Standard reachability). A location | is reachable in a program 
P if So >* L. 


We now define correctness and completeness for a program analyzer. 


Definition 2 (Correctness, completeness). Let V : (P,l) |> {1,0} be a 
verifier taking as input a program P and a target location l. 
— V is correct when for all P, l, if V(P,l) = 1 then l is reachable in P ; 
— V is complete when for all P, l, if l is reachable then V(P,) =1 ; 
— if V also takes an integer bound n as input, V is k-complete when for all 
bound n and P,l, if l is reachable in at most n steps then V(P,l,n) = 1. 


We want to stress out that while location reachability can be seen as a basic 
case, we consider it sufficient here for two reasons: first, it keeps the formalism 
light while still straightforward to generalize to stronger reachability properties 
(e.g., local predicates of the form (l, p), sets of finite traces, etc.); second, it 
is already rather powerful on its own, as we can still instrument the code to 
reduce some stronger forms of reachability to it (e.g., adding local assertions or 
monitors). 


3.3 Symbolic Execution 


Symbolic execution (SE) [52,83,23,24] is a symbolic exploration technique for 
standard reachability. Algorithm 1 gives a high-level view of a typical SE al- 
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Algorithm 1: Standard symbolic execution algorithm, taken from [48] 


Input: a program P, a bound k, a target location | 
Output: Boolean value indicating whether l can be reached within k steps. 


1 for path m in GetPaths(k) do 


2 if m reaches l then 

3 @® := GetPredicate(7) 
4 if ® is satisfiable then 
5 return true 

6 end 

7 end 

8 end 


9 return false 


gorithm, adapted for location reachability®. The analysis follows each possible 
path a of a program up to a depth bound k. If m reaches the target, then we 
check whether 7 is indeed feasible by computing its path predicate ® — a logical 
formula representing the path constraints over the input variables along 7, and 
sending it to a SMT solver [12], that will try to answer whether the formula 
is satisfiable or not, and provide a model for free variables (e.g. inputs) if it is 
(omitted here for simplicity). SE is correct for location reachability, and even 
k-complete if we assume a perfect encoding of path predicates. 


Algorithm 2: Assignment evaluation in SE 


Input: path predicate ®, assignment instruction x := expr 
Output: Updated & 


1 Function eval_assign(®, x, expr) is 
2 return & A (x = expr) 
3 end 


In this paper, we will focus on the evaluation of assignments and conditional 
jumps for SE, detailed in Algorithms 2 and 3 respectively, as this is where our ad- 
versarial symbolic execution will mainly differ from the standard one. It requires 
going slightly deeper into details. In practice, the program paths are explored 
incrementally. A worklist WL records all pending paths together with their as- 
sociated path predicate and their next instruction to explore. On conditional 
branches, the symbolic path is split in two (one for each branch, updating the 
path constraint accordingly), and each new prefix is added to the worklist (Al- 


5 More complex properties can be verified with the same principles, such as local 
predicate reachability, trace properties or hyper-properties [36]. 
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Algorithm 3: Conditional jump evaluation in SE 
Input: path predicate ®, conditional jump instruction if cdt then l else le 
Data: a worklist WL containing the pending path prefixes to explore — list of 
pairs (path predicate, next location) 
Output: WL updated in place 


Function eval_conditional_jump(®, cdt, l+, le) is 
if } A cdt is satisfiable then 
| Add (8A cdt, l+) to WL 
end 
if A^ (~cdt) is satisfiable then 
| Add (8 A ~cdt, le) to WL 
end 


ONantkhwn er 


end 


gorithm 3). Assignments are dealt with straightforwardly, simply adding a new 
logical variable definition to the path predicate ê (notation: z + y). 


4 Adversarial Reachability 


In this section, we detail the advanced attacker model we consider and define 
the adversarial reachability problem. Especially, advanced attackers can do more 
than carefully crafting legitimate inputs to trigger vulnerabilities in a software. 
They can use a wide variety of attack vectors (e.g. hardware fault injection at- 
tacks, software-implemented hardware attacks, micro-architectural attacks, soft- 
ware attacks, etc), in any combination, and multiple times. We suppose attack 
vectors prerequisites have been met, and only consider the impact of the faults 
on the program under attack. 

Our attacker model has three components: (1) a set of attacker actions, equiv- 
alent to fault models; (2) the maximum number of actions the attacker can 
perform; and (3) a goal, expressed here as a location reachability query. 


Formally, given a program P with set of states S, set of transitions T and 
set of locations L, we extend the transition model described in Section 3.2 to 
include an adversarial transition ~>4€ S x S related to an attacker A, i.e. T4 = 
TU ~a. To specify practical fault models, restrictions are applied onto ~+4, 
limiting what part of the state can be modified and how. For instance, when 
considering arbitrary data faults, only the data memory and the register values 
can be modified. Then, the transition relation of P under attacker A is denoted 
as œ a=—> U ~4= (Urert)U~ 4. We extend the notations from Section 3.2 to 
the relation œ 4. Especially, S =>% s’ means ds € S.s =>} s’, the adversarial 
transition relation up to k is denoted 4,<x. 


6€ Actually, a symbolic state usually comprises the path predicate itself plus a mapping 
from program variable names to logical variable names, and assignments involve both 
creating new logical names and updating the mapping. We abstract away from these 
details. 
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Still, we need to take into account the maximum number of faults the at- 
tacker can perform along an execution. Given a path m over T4, 7 is said to be 
legit if it does not contain ~ 4, and faulty otherwise. The number of occurrences 
of transition ~4 in 7 is its number of faults. Given a bound ma on the fault 
capability of A, we define 7 aes by limiting the adversarial reachability rela- 
tion to paths m with less than ma faults. We consider m, to be +œ in case the 
attacker has no such limitation. For the sake of simplicity, in the following, we 
will consider m4 as an implicit parameter of A, and simply write =>% instead of 


* 
(Ama) 


Definition 3 (Adversarial reachability). Given an attacker A with a ma 
faults budget and a program P, a location l € L is adversarially reachable if 
So +, 8’ A loc(s') = l for some s’ € S. 


In the following, adversarial reachability of location l from a set of states So 
will be denoted Sp +>% L 


Proposition 1. Standard reachability implies adversarial reachability. The con- 
verse does not hold. 


Proof. Standard reachability can be viewed as adversarial reachability with an 
attacker able to perform 0 faults. 


We redefine what it means for an analysis answering adversarial reachability 
to be correct, complete and k-complete. 


Definition 4. Let V4: (P,A,1) + {1,0} be a verifier taking as input a program 
P, an attacker A with ma fault budget and a target location L. 
— Va is correct given A when for all P, l, if Va(P,A,l) = 1 then l is adver- 
sarially reachable in P for attacker A; 
— Va is complete given A when for all P, l, if l is adversarially reachable for 
attacker A then Va(P,A,l) =1 ; 
— if Va also takes an integer bound n as input, Va is k-complete given A when 
for all integer n and P,l, if l is adversarially reachable in at most n steps 


then Va(P, A,l,n) = 1. 


5 Forkless Adversarial Symbolic Execution (FASE) 


In this section, we present our forkless algorithm for adversarial reachability. The 
analysis aims to find inputs and a fault sequence compatible with the considered 
attacker model and reaching the target location. Our primary goal is to deal 
with the potential path explosion induced by possible faults. Our design guiding 
principles are the following: 

— First, prevent path explosion as much as possible with a forkless fault en- 
coding. Yet, this forkless encoding leads to logical formulas potentially more 
complex and harder to solve in practice; 

— Second, reduce as much as possible the complexity of the created formulas, 
by avoiding the undue introduction of extra-faults along a path. 
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5.1 Modelling Faults via Forkless Encoding 


The forkless encoding aims to address the path explosion induced by the forking 
treatment of fault injection in prior works. It is designed mainly for data faults 
and consists of wrapping arithmetically an assignment right-hand side, as shown 
in Figure 3 for an arbitrary data fault. The activation of this fault location is 
determined by the symbolic Boolean value fault_ here, and the corrupted value 
of x is the fresh variable fault_ value. 


The point is to embed the fault injection as an expression inside the logical 
formula, without any explicit path forking at the analysis top-level, in order to 
let the analyzer reason about both legit executions and faulty executions at the 
same time — this is akin to path merging in some ways, except that we do it only 
for the treatment of fault injection (we could also see the approach as avoiding 
undue path splits). 

Multiple forkless arbitrary data encodings are possible. We chose to use the 
ite expression operator, an inlined form of if-then-else at the expression level. 
We also tried encodings inspired from branchless programming idioms (e.g.: 
(b)-a+(1—b)-y. for ite(b, x, y) with b a Boolean value) — in our experiments they 
worked as well as the ite operator. Other data fault models are supported, such 
as set, reset, bit-flips, etc. Test inversion is also supported by applying faults to 
the condition of conditional jumps. Table 1 illustrates various forkless encodings. 
Note that the forkless encoding is not designed for instruction corruptions or in- 
struction skips, as these modifications either yield permanent code modification 
or span several instructions. 


Xx := expr x:=ite fault_here? fault_ value: expr 


(a) Original statement (b) Forkless transformation for arbitrary data fault 


Fig. 3: Forkless injection technique 


Table 1: Forkless encodings for various fault models 
Fault model |original instruction |Forkless encoding 


Arbitrary data|x := expr x := ite fault_here ? fault_value : expr 
Variable reset |x := expr x := ite fault_here ? 0x00000000 : expr 
Variable set |x := expr x := ite fault_here ? Orff : expr 
Bit-flip £ := expr x := ite fault_here ? 


(expr xor 1 << fault_value): expr 
Test inversion |if cdt then goto 1 fif (ite fault_here ? !cdt : cdt) 
else goto 2 then goto 1 else goto 2 
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Trade-off. While these sorts of encoding indeed allow a significant path re- 
duction compared to forking approaches, the corresponding path predicates are 
more complicated than standard path predicates, as they involve lots of extra- 
symbolic variables for deciding whether the faults occur and for emulating their 
effect. We show later in this section how to reduce these extra-variables. 


5.2 Building Adversarial Path Predicates 


Adversarial symbolic execution requires modifications to Algorithms 2 and 3, as 
illustrated in Algorithms 4 and 5 respectively. 


Algorithm 4: Forkless assignment evaluation 


Input: path predicate ®, assignment instruction x := expr, current number of 
faults nbp 
Output: Updated & 


Function eval_assign(®, x, expr) is 
©’, expr’, nby := FaultEncoding(®, expr, nbs) 
return & A (x Ê expr’) 


e wo N 


end 


Algorithm 5: Forkless conditional jump evaluation 
Input: path predicate ®, conditional jump instruction if cdt l else le 
Data: fault counter nbs, maximal number of faults maxs, worklist WL 
Output: WL updated in place 


Function eval_conditional_jump(®, cdt, l+, le) is 
if A^ cdt A (nby < maz) is satisfiable then 
| Add (@ A cdt, l+) to WL 
end 
/* Idem for else branch (-cdt) */ 
end 


A Ne 


on 


The assign evaluation process embeds a wrapper encoding the fault in a fork- 
less manner. Note that FaultEncoding involves the declaration of fresh symbolic 
variables for fault decisions and fault effects — hence the update of the path pred- 
icate &. Also, the fault counter nby is updated, and a new potentially faulted 
expression expr’ is computed. 


Note that checking if the fault counter nby does not exceed the maximal 
number of faults maz, can be performed at different places. We found the best 
trade-off is to augment the conditional jump queries to check if we could explore 


Adversarial Reachability 71 


each branch without exceeding maz, (see Algorithm 5), as checking at the end 
of a path often involves exploring many unfeasible faulty paths. 


We refer to this set of modifications as Forkless Adversarial Symbolic Execu- 


tion (FASE). 


5.3 Algorithm Properties 
We now consider the properties of the FASE algorithm. 


Proposition 2. The FASE algorithm is correct and k-complete for adversarial 
reachability. 


Sketch of proof. If our algorithm finds an adversarial path reaching the target lo- 
cation l, by providing specific input values and a fault sequence, then an attacker 
executing the program with the provided inputs and performing the proposed 
faults will reach its goal. Our algorithm is based on symbolic execution with 
bounded path depth and explores all possible attack paths according to the 
considered attacker model, hence its k-completeness for adversarial reachability. 


Tightness of FASE. Consider a single path with no branching instruction 
and an assert statement to be checked at the end, together with f possible fault 
locations and a maximum of m faults. Then the forking SE yields up to CJ, paths 
to analyze, and as many queries to send to the solver. In the same scenario, FASE 
will analyze only the original path, and send a single query to the solver. 

Still, the Forkless encoding increases query complexity, as shown in Section 
7. We present in the remainder of this section two mitigation techniques. 


5.4 Optimization via Early Detection of Fault Saturation 
(FASE-EDS) 


Algorithm 6: FASE-EDS conditional jump evaluation 


Input: path predicate ®, conditional jump instruction if cdt then ls else le 
Data: fault counter nby, maximal number of faults max p, worklist WL 
Output: WL updated in place 


1 Function eval_conditional_jump_EDS(@, cdt, l+, le) is 

2 if BA cdt ^ (nbs < maz) is satisfiable then 

3 | Add (@ A cdt, l4) to WL 

4 else if S A cdt A (nby == maxy) is satisfiable then 

5 Stop injection in this path 

6 Add (@ A cdt, l4) to WL 

7 end 

/* Idem for else branch (-cdt) */ 
8 end 
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The first angle we explore to minimize query complexity is to reduce the 
number of injection points by stopping the injection process as soon as possible. 
Indeed, fewer injection points mean fewer extra symbolic variables and in general 
smaller and simpler queries for the SMT solver. We call this optimization Early 
Detection of fault Saturation, and write FASE-EDS when it is activated. 

Its difference compared to FASE is in handling conditional jumps, illustrated 
in Algorithm 6. Instead of checking whether a branch can be explored without 
exceeding the maximum number of faults, we double the check: (1) first we check 
whether the branch can be explored with strictly fewer faults than allowed. If 
the query is satisfiable, the analysis continues down that branch as usual; (2) if 
not satisfiable, we check whether the branch is feasible with exactly the maximal 
number of faults allowed. If not, the branch is infeasible and we stop as usual. Yet, 
if it is feasible, then we know that we have spent all allowed faults. We can thus 
continue the exploration without injecting any new fault in the corresponding 
search sub-tree, leading to simpler subsequent queries. 


Proposition 3. FASE-EDS is correct and k-complete for the adversarial reach- 
ability problem. 


Proof. FASE-EDS remains correct as it does not modify the path predicate 
computation, and it remains k-complete as it only prunes fault injections that 
are actually infeasible — and would have been proven so by the solver, later in 
the solving process. 


5.5 Optimization via Injection on Demand (FASE-IOD) 


The second angle explored to reduce query complexity through the reduction of 
injection points is to inject faults on demand, only when they are truly needed. 
We call this optimization Injection On Demand, and write FASE-IOD when it 
is activated. 

To inject faults on demand, we now build two path predicates along a path: 
the working path predicate ® based on which solver queries are built (where we 
try to minimize fault injection), and the normal adversarial path predicate Dr 
computed in previous sections (encompassing all the faults seen so far). 


Algorithm 7: FASE-IOD assignment evaluation 


Input: path predicate , faulted path predicate r, assignment instruction 
x := expr, current number of faults (in r) nbs 
Output: Updated &, $p 


Function eval_assign_I0D(®, r, cdt, x, expr) is 
p, expr’, nbz := FaultEncoding(®pr, expr, nbp) 
return (@ A (x = expr), Bp A (x Ê expr’)) 

end 


À Ù N Be 
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Algorithm 8: FASE-IOD conditional jump evaluation 


Input: path predicate ®, conditional jump instruction if cdt then l: else le 

Data: fault counter nbs, maximal number of faults max ;, under 
approximation counter under_ counter, worklist W L 

Output: WL updated in place 


Function eval_conditional_jump_I0D(@, r, cdt, l+, le) is 
if 8 A cdt A (nby < maz) is satisfiable then 
Add (€ A cdt, Pr A cdt, l+) to WL 
else if under_ counter < maxs then 
if r A^ cdt A (nby < mazp) is satisfiable then 
P := Pr 
under_ counter := under_ counter + 1 
Add (@ ^ cdt, Pr A cdt, lt) to WL 
end 


OMNIA AR WN EF 


end 
/* Idem for else branch (-cdt) */ 
11 end 


m 
= 


Algorithms are updated accordingly. Especially, assignment evaluation is du- 
plicated as shown in Algorithm 7: The normal symbolic assignment, with the 
original right-hand-side expression expr, is added to , while p is updated with 
the fault encoding of the assignment, expr’. 


The on-demand reasoning takes place in the conditional jump instruction 
process detailed in Algorithm 8. The basic idea is to first check branch feasibility 
with the simpler path predicate &, encompassing the least number of faults. We 
continue this way as long as we can, meaning we rely on standard reachability 
as much as we can. 

When encountering a branch infeasible with ®, we then check whether this 
branch is feasible with all the possible faults seen so far, i.e. using pr. If no 
that is a stop, otherwise we know that ® does not encompass enough faults to 
go further. We then replace ® by p (called a switch) at this stage, and thus 
continue with strictly more faults. Note that this is straightforward as p and 
® only differ on fault injections. Then again, the new @ will not accumulate any 
fault (until a new switch) while p continues accumulating all possible faults. 

As a bonus, the number of path predicate switches gives us an under- 
approximation under_counter of the number of faults already needed in the 
path under analysis. We use it to stop the injection early, when at least maz ş 
faults have been used. 


Proposition 4. FASE-IOD is correct and k-complete for the adversarial reach- 
ability problem. 


Proof. FASE-IOD explores the same feasible paths as FASE, hence preserving 
its properties. 
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5.6 Optimizations Combination 


Algorithm 9: FASE-IOD and FASE-EDS combination, conditional 
jump evaluation 
Input: path predicate ®, faulty path predicate r, conditional jump 
instruction if cdt then l, else le 
Data: fault counter nbs, maximal number of faults max, under 
approximation counter under_ counter, worklist W L 
Output: WL updated in place 


1 Function eval_conditional_jump_EDS_IOD(®, @p, cdt, l+, le) is 


2 if BA cdt ^ (nbs < maz) is satisfiable then 

3 Add (@ A cdt, Pr A cdt, l+) to WL 

4 else if under_ counter < max; then 

5 if Pr A cdt A (nby < mazp) is satisfiable then 

6 p := Or 

7 under counter := under_ counter + 1 

8 Add (@ A cdt, Pr A cdt, lt) to WL 

9 else if p A^ cdt A (nbs == mars) is satisfiable then 
10 p := Op 

11 Stop @’ update and queries 

12 Add (@ A cdt, Pr A cdt, l+) to WL 
13 end 
14 end 

/* Idem for else branch (-cdt) */ 

15 end 


Both optimizations can be combined together as illustrated in Algorithm 
9. Taking FASE-IOD as a basis, saturation detection is added in the faulted 
path predicate p queries at conditional branch handling. If the saturation is 
detected, the main path predicate switch to Pp but Pr stops being updated and 
queried further down that path, which stops fault injection. 


Proposition 5. The combination of FASE-EDS and FASE-IOD is correct and 
k-complete for the adversarial reachability problem. 


Proof. This combination also explores all possible paths for the considered at- 
tacker models, like FASE, hence preserving its properties. 


6 Implementation 


We now provide details about our forkless adversarial symbolic execution (FASE) 
implementation, named BINSEC/ASE, for Adversarial Symbolic Execution. The 
code is made open-source”. 


T https: //github.com/binsec/binsec-ase 
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Binary-level Fault Injection. While our method works for any program ab- 
straction level, we choose to implement it for the binary level, which makes more 
sense in many security scenarios. We implement our forkless adversarial symbolic 
execution on top of the BINSEC symbolic engine [38,40,10]. It has already been 
used in a number of significant case studies [9,81,80,36,37], and it is notably able 
to achieve bounded verification (k-completeness) and to reasonably deal with 
symbolic pointers [44]. 

We modified the path predicate computation of BINSEC 0.4.0 as described in 
Section 5, and implemented our dedicated optimizations FASE-EDS, FASE-IOD 
and FASE EDS+IOD. BINSEC consists of 60kloc of Ocaml and our modifica- 
tions add 6kloc. The attacker goal is specified as a local predicate to reach, 
using BINSEC directives. We currently support data faults such as arbitrary 
modification, bit-flip and reset. Test inversion is emulated through faulting the 
condition of conditional jumps. We let the user define an injection target range, 
made of multiple code address intervals. For large programs, it enables focusing 
on the security critical sections. Finally, we also provide a blacklist for some 
memory locations which will never be faulted. The blacklist is mostly used for 
the stack register (esp in x86, which is concretized in the analysis) and the pro- 
gram counter, as our fault model does not include tampering with the stack nor 
arbitrary control faults. 


Details. Our exploration strategy is depth first, the underlying SMT solver is 
Bitwuzla [71]. We constrain the faulted values to differ from the original values 
in fault encodings, such that only true corruptions are reported as active faults. 


7 Evaluation 


We now evaluate our new algorithm for software verification against multi-fault 
attacks. We consider the following research questions. 

— RQL1: is our tool correct and complete? In particular, can we find attacks 

on vulnerable programs and prove secure resistant programs? 

— RQ2: can we scale in number of faults without path explosion? 

— RQ3: what is the impact of our optimizations? 
Besides this evaluation, we also show the use of our method in a number of 
different security scenarios (Section 7.5), and on a larger case study (Section 8). 


7.1 Experimental Setting 


The Machine Used. We ran our experiments on a cloud machine with a proces- 
sor Intel Dual Xeon 4214R with 48 CPU cores and 384GB of RAM. Experiments 
ran in parallel on the 48 cores, each run using only one core. 


The Attacker Model chosen in this evaluation can perform a varying number 
of faults. Its goal is expressed as a security oracle directly written in C for each 
benchmark, the computation of which is not faulted. 
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The Benchmark used here is a standard set of programs from the SWiFI 
literature on physical fault injections and high-security devices, characterized 
in Table 2. First, the 8 versions of VerifyPIN from the FISSC [42] benchmark 
suite, dedicated to the evaluation of physical fault attack analyses. VerifyPIN is 
an authentication program. There are one unprotected and 7 different protected 
versions, some vulnerable, some resistant to one test inversion fault. We added 
2 manually unrolled versions of the unprotected VerifyPIN, with a PIN size of 
4 and 16, to add diversity in the benchmarks with programs without loops. 
An oracle is provided by FISSC, checking if the user PIN truly corresponds to 
the reference PIN. Second, we take the 2 versions of the npo2 program from 
Le et al. [65], together with their oracles. Npo2 is a program computing an 
integer’s upper power of two. The attacker’s goal is to perform a silent data 
corruption, i.e. change the end result without triggering countermeasures. One 
version is vulnerable to one arbitrary data fault, the second is resistant due to 
extra arithmetic checks. 


Compilation. The benchmarks are written in C and have been compiled with 
gcc for the Intel x86-32 architecture, using the flag “-O0” to preserve counter- 
1 


measures. For BINSEC compatibility, we use the “static” flag to include the 
necessary library functions directly in the binary. 


Table 2: Benchmarks characteristics and statistics of a standard SE analysis 


BINSEC analysis - no fault 
Program group (#) C loc| x86 loc|#instruction|#¢paths| #branch Time 
(explored) in a path 
Section 7 
VerifyPINs (8) 80-140] 160-215 192-269 1 17-34 < 0.1s 
VerifyPIN unrolled (2)} 40-85|140-430 142-442 5-17 5-17 < 0.1s 
npo2 (2) 50|200-220 607-653 3 31-33 < 0.1s 
Section 8 
Wookey bootloader 3.2k| 2350 290k 17 18k 9s 
Section 7.5 
CRT-RSA (3) 125-170]400-600}  108k-29M 1| 5k-1.3M]0.4s - 1m27 
Secret keeping 
machine (2) 100-200] 240-360 1k-1.3k 1} 130-150 < 0.1s 
VerifyPIN_0 
with SecSwift 80 430 430 1 22 < 0.1s 


BINSEC Settings. We limit the maximal depth of an analysis to the depth 
necessary to perform an exhaustive non-faulty analysis, rounded to the upper 
hundred. We exhaustively explore all the possible paths up to this bound and 
do not stop at the first identified attack, in order to have comparable results. 
We set the global analysis timeout for 1 day. We fault values and not addresses, 
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we do not directly fault the stack pointer nor the program counter, and we do 
not fault the status flags unless explicitly specified. 


7.2 Correctness and Completeness in Practice (RQ1) 


We first show that our tool works as expected on several codes with known 
ground truth. (1) We check that indeed, with no fault allowed, no attack is 
found in any of the benchmarks; (2) We check that indeed the insecure npo2 
program is vulnerable to a single arbitrary data fault while the secure version is 
not — it can still be exploited with two faults; (3) According to their authors, the 
VerifyPIN versions 0 to 4 are vulnerable to one test inversion, while VerifyPIN 
5 to 7 are resistant to it. We indeed reproduce these results. When allowing two 
faults, all VerifyPIN become vulnerable; (4) When using one arbitrary data fault 
against the VerifyPINs, all versions are found vulnerable. We manually check 
that indeed the identified attack paths make sense; (5) Our manually unrolled 
versions of VerifyPINs do not contain conditional branching instructions in the 
targeted function, making them resistant to test inversion. We check that this is 
the case, while they are still vulnerable to a single arbitrary data fault. 


Conclusion. Our tool indeed can showcase a program vulnerability to fault 
injection attacks and prove resistance to fault injection attacks, as expected by 
the correctness and k-completeness properties of the underlying algorithms. 


7.3 Scalability (RQ2) 


For this evaluation, we focus on an attacker capable of arbitrary data faults, as 
those weigh the heaviest on the analysis. 

We take FASE-IOD as our best performing technique (see Section 7.4). We 
evaluate here its capability to handle multi-fault and avoid path explosion, com- 
pared to the forking technique. Results are illustrated in Figures 4 and 5. Note 
that all FASE variants explore the same number of paths, and are thus repre- 
sented as FASE in Figure 5. For each benchmark, we took the arithmetic mean 
for 100 runs. Values presented here are the geometric mean over the benchmarks. 

FASE-IOD is 10x times faster than Forking for 1 fault, and x200 times faster 
for 2 faults on average. For the best case benchmark, we are x224 times faster for 
1 fault and x6121 for 2. Starting from three faults onward, Forking experiences 
timeouts, rendering values non comparable. Half of the benchmark timeouts for 
3 faults, three quarters for 4 faults, 11 over 12 for 6 faults and all of them after 
that. FASE-IOD never timeouts in this experiment. This scaling is enabled by 
avoiding path explosion. On average, Forking explores x50 times more paths for 
2 faults than for one, while FASE-IOD only explores x3 times more paths. From 
Figure 4, we see FASE on its own already scales better than Forking, being 
x3 times faster for 1 fault and x108 times faster for 2, and never experiencing 
timeouts either. 


Conclusion. FASE-IOD shows improved scalability in terms of the maximum 
number of faults allowed, for the arbitrary data fault model, compared to the 
forking technique. 
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7.4 Performance Optimization (RQ3) 


We evaluate our forkless variants: FASE, FASE-EDS, FASE-IOD and FASE 
EDS+IOD, to determine which performs best for arbitrary data faults. Results 
are illustrated in Figures 4, 5 and 6. 

We vary again the maximum number of faults from 1 to 10. Note that all 
FASE variants explore the same number of paths for each number of faults, as 
the optimizations reduce the number of faults injected but do not lose correct- 
ness nor k-completeness. FASE indeed generates complex queries®, taking on 
average around twice the time necessary for Forking queries to be solved. FASE- 
EDS then gains a little bit in that regard. FASE queries take only x1.04 longer to 
solve on average for all fault numbers. The real improvement comes with the On- 
Demand logic of FASE-IOD (x2.02 times faster on average over all fault numbers) 
and FASE EDS+IOD (x2.02 also), where query complexity drops to the level of 
Forking. This improvement in query complexity is achieved algorithmically at 
the price of query creation. However, due to more queries being arithmetically 
simplified, fewer queries are sent in the end to the solver for FASE-IOD (x0.88 
on average over all fault values compared with FASE) and FASE EDS+IOD 
(x0.98). FASE-EDS sent approximately the same number of queries as FASE. 
The number of queries sent to the solver explodes for Forking, correlated with 
the path explosion experienced. In terms of performance, two trends appear 
as the number of faults allowed increases. FASE and FASE-EDS tend to be be- 
tween x2 and x3 times slower than FASE-IOD and FASE EDS+IOD. In the end, 
FASE-IOD proves to be the fastest optimization (x1.1 times faster than FASE 
EDS+IOD on average over all number of faults), likely due to the combination 
of on-demand logic and fewer queries than FASE EDS+IOD. 


Conclusion. We retain FASE-IOD as our best performing forkless adversarial 
algorithm, at most x3.06 faster than FASE. 


7.5 Other Experiments and Fault Models 


CRIT-RSA. Puys et al. [78] describe three versions of CRT-RSA: unprotected, 
Shamir version and Aumuller version. Only the last one is shown to resist the 
BellCoRe attack [16] which uses a single reset fault to break the cryptography. 
We were able to automatically reproduce the attack with 1 reset fault on the 
unprotected version of CRT-RSA, after 3s of analysis, and we were not able to 
find attacks on the other two versions in 10 days time. 


Secret-keeping Machine. Dullien [41] proposes two versions of a secret-keeping 
machine. The one based on linked lists is manually shown to be exploitable by 
an attacker able to perform a single bit-flip in the memory (not in registers), 
while the array version is shown to be secure against that. For this benchmark, 


8 When counting the number of ite operators introduced in queries, from having barely 
any in arun without faults, we reach around 2,800 ite per query on average for FASE 
and 1,500 for FASE-IOD for one fault. 
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we activated faults on variables used as addresses. We were able to reproduce 
the attack on the linked list implementation with one bit-flip fault and to show 
the array implementation is secure for this fault model. In addition, if we allow 
faults in registers too, the array implementation becomes vulnerable. 


SecSwift Countermeasure. We applied the SecSwift countermeasure, a llvm- 
level protection developed by STMicroelectronics [45,27], to VerifyPIN version 0. 
We were able to find attacks yielding an early loop exit on this binary with either 
a single test inversion or a single arbitrary data fault. These paths belonging 
to the CFG of the program, these attacks are not unexpected, yet it is still 
interesting that our method finds them automatically. 


8 Case Study: the WookKey Bootloader 


We now confront our tool to a real-life security system, WooKey. 


Presentation of WooKey. First presented in 2018 by ANSSI, the French sys- 
tem security agency, the WooKey platform [14,89] is “a custom STM32-based 
USB thumb drive with mass storage capabilities designed for user data encryp- 
tion and protection, with a full-fledged set of in-depth security defenses”. Their 
choice to be open source and open hardware makes WooKey a relevant case 
study: it is a real-life, complex device, security focused and available for repro- 
ducibility. Note also that Wookey has been extensively analyzed, as it was the 
target of an ANSSI cybersecurity challenge for security professionals [5]. 


Security Scenario and Goal of our Study. We focus on WooKey bootloader, 
a dual-bank system enabling hot firmware updates. The system is hardened, 
especially redundant test protections are present in critical sections to protect 
against test inversion faults. We consider the same attacker model as the ANSSI 
challenge did [5]: the attacker seeks to manipulate the bootloader logic to boot 
on the older firmware, more likely to contain security vulnerabilities. We also 
consider an attacker able to perform a single arbitrary data fault. We see in 
Table 2 that WooKey bootloader size is orders of magnitude larger than the 
programs used for evaluation in Section 7. Wookey is available as C code. We 
compile it like we did for the evaluation benchmarks (Section 7.1). 
We conduct the following three analyses: 

1. automatically analyze WooKey at binary-level to check whether we are able 
to find previously known faults [63], and/or new ones: we are indeed able 
to find the two faults identified by prior work [63] (A1, A2), as well as an 
attack they do not mention (A3); 

2. automatically analyze at binary-level the patch version of Wookey proposed 
by Lacombe et al. [63]: we found that the proposed patch indeed blocks the 
two known attacks (Al and A2), but not the new attack (A3); 

3. propose a definitive patch by adding a counter-measure for A3 and remove 
parts of the counter-measures which are shown to be useless here. The patch 
is proven correct w.r.t. our attack model. 
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We discuss these results in the following and we present briefly in Section 8 the 
discovery of two more known faults. Overall, it demonstrates that our technique 
can scale to binary-level real-size systems. 


Analyze Key Parts of Wookey. Lacombe et al. find an attack in the loader _ 
exec_req_selectbank function (A1) and another in the loader_exec_req_ 
flashlock function (A2). They correspond to data corruption in branching con- 
ditions. We are able to find both attacks, linking faults back to their locations in 
the C code with debug information. We also find an additional attack, faulting 
another part of the loader_exec_req_flashlock function (A3). 


Analyze a Security Patch of WooKey. We now evaluate the protection 
scheme proposed by Lacombe et al. [63] for these attacks. It consists of four 
extra counter-measures named from CM1 to CM4. We found indeed that the 
full protection prevents attacks Al and A2, as claimed by the authors of the 
patch. Yet, our analysis shows that the protection does not prevent the new 
attack A3. 


Propose a New Patch and Evaluate It. We manually inspect these different 
analysis results to understand what happens. We have especially been able to 
identify the root cause of A3 and propose a dedicated countermeasure for it 
(named CMA). Also, by analyzing each counter-measure in isolation, we have 
been able to understand that counter-measures CM1 and CM3 do not block any 
attack path as they are redundant with other tests in the code and can be safely 
removed. Overall, our new patch (CMA + refined former patch) is shown by 
our tool to protect against all the attacks, for an attacker able to perform one 
arbitrary data fault (Table 3). 


Table 3: Table summarizing the effects of countermeasures 
Protection scheme A1|A2| A3 
(new) 
1.3|1.31| 1.25 
Normal Wookey VIs] 
Prior patch (CM1+CM2+CM3+CM4) X| X | v 
Our patch (CM2+CM4+CMA) XIXI X 


Legend - v: attack path found by our tool / X: no attack found 


Other Attacks on WooKey. We were also able to find two other known at- 
tacks on Wookey. (Attack vector combination) The iso8716 library, used in 
WooKey for secure communication, presents a vulnerability to fault injection 
which enables a software buffer-overflow in function SC'_get_ATR [63]. Us- 
ing an attacker with a single arbitrary data fault, we were able to reproduce 
this attack; (Faulty redundant test) Martin et al. [68] shows an incorrect im- 
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plementation of a redundant test to prevent single test inversion faults in the 
loader_ set_ state function. We reproduce this result. 


9 Discussion 


Fault Models. Our current approach does not support advanced control faults 
such as instruction corruption or instruction skip. Instruction corruption is out 
of scope as it permanently changes an instruction, while we modify computation 
results. It is related to self-modification, a notoriously difficult point to address in 
adversarial binary-level code analysis [17,77]. Instruction skip (or other arbitrary 
control jumps) could be modeled by local modification of the program counter, 
yet at the price of a huge path explosion. Also, regarding micro-architectural 
attacks, modeling Spectre attacks is difficult due to the speculative windows 
mechanism and its associated rollback. 


Other Formal Methods. While in the paper we focus on symbolic execution, 
we believe the main optimization ideas developed here can be used with other 
formal techniques, e.g. Bounded Model Checking [29,31], Abstract Interpretation 
[34] or CEGAR [30]. Note that for each of them, fault injection may result either 
in path explosion or precision loss. Still, our forkless encoding should be able to 
help at least all approaches based to some extent on path unrolling. 


Other Properties. The forkless encoding can surely benefit other classes of 
properties to be achieved by the attacker, especially those known to be sup- 
ported by (extensions of) symbolic execution, for example: trace properties such 
as use-after-free, k-hyperreachability properties (secret leakage, privacy leakage, 
violation of constant-time, etc.) [36], the recent robust reachability proposal [48] 
for replicable bugs, etc. Our formalism itself is quite generic and can accom- 
modate a wide range of properties, as we mainly keep the property unchanged 
but modify the underlying transition system. We could for example imagine an 
attacker willing to activate a non-terminating execution (denial of service). 


Forkless Encoding and Instrumentation. Several prior works use code-level 
instrumentation [68] or LLVM-level instrumentation [76,63,65] in order to lever- 
age standard program analyzers as is. The forkless encoding we propose can 
also be used this way, for more flexibility but without additional optimizations. 
Actually, we performed some experiments with Klee and a C-level forkless instru- 
mentation, and do observe significant improvement over forking instrumentation. 


10 Related Work 


SWiFI. Prior work in SWiFI has already been discussed in Section 3. All meth- 
ods in this domain consider low-level formalism: C [28,68], LLVM [76,63], binary 
[25,15,20,50]. Half of the techniques rely on the mutant approach [28,79,49,25,50], 
and the other half relies on forking [76,15,20,63]. While most approaches target 
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attack finding (with symbolic execution and bounded model-checking), some do 
aim at full verification [79], especially with deductive verification [68,28]. Very 
few works consider multi-faults [76,63,68]. Interestingly, Lacombe et al. [63] pro- 
pose a static way of reducing injection points on C programs, that is comple- 
mentary to our own method ~ still, static analysis at binary-level is known to be 
hard. Note that a few methods do consider instruction skips [49,20,50], yet with 
path explosion issues. 


Robustness Analysis. SWiFI is also used for robustness evaluation 
[64,74,56,88,65,72,32,90], in order to verify the correct behavior of error han- 
dling mechanisms. They rely also on forking or mutant techniques. The fault 
models are similar to hardware fault injection, yet multi-fault is not really an 
issue there, as faults are supposed to originate from safety issues (e.g. cosmic 
rays) and have no reason to accumulate unreasonably. 


Formalizations and Fault Models. While it is common in the field of au- 
tomated formal verification of cryptographic protocols to consider models of 
attackers (typically, extensions of the “Dolev-Yao” model) — either by specifying 
what the attackers can do [2] or what they cannot do [7], only very few for- 
malizations of software-level attacker capabilities have been proposed so far. In 
software security, control-flow integrity attacks have been categorized by the ca- 
pability an attacker needs [21], but these efforts have been restricted to manual 
reasoning. Interestingly, Given-Wilson et al. [51] propose a formalization of fault 
injection using Turing machines, but to our knowledge, no algorithm has been 
built for it. Also, Fournet et al. [46] propose a type system for program-level 
non-interference, taking into account an active adversary modeled as adversarial 
components able to perform any action at certain steps of the program. 


Mutation Testing. Sometimes called software fault injection, mutation test- 
ing [75,33] aims to generate a comprehensive test suite by building test cases 
discriminating various mutants of a program, and is recognized as a very pow- 
erful testing criterion. As it focuses on coverage, mutant explosion cannot be 
avoided. Dedicated SE techniques [73,8,11,67] have been designed. 


11 Conclusion 


We formalize the concept of adversarial reachability, extending standard reach- 
ability to include the presence of an advanced attacker in program analysis, and 
we propose a dedicated symbolic algorithm for adversarial reachability, integrat- 
ing a novel forkless encoding of faults together with dedicated optimizations. 
Our technique is shown to significantly reduce the number of paths to explore, 
and scales up to 10 faults on a standard SWiFI benchmark, where prior forking 
attempts timeout for 3 faults. Also, we show that our method scale to realistic 
size examples, such as the WooKey project where we have been able to replay 
known fault attacks and to even find a vulnerability not mentioned in a recently 
proposed countermeasure patch. 
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Abstract. With the rapid transition to distance learning, automatic 
grading software becomes more important to both teachers and students. 
We study the problem of automatically grading the regular expressions 
submitted by students in courses related to automata and formal lan- 
guage theory. In order to utilize the semantic information of the regular 
expression, we define a declarative logic that can be described by regular 
language and at the same time has natural language characteristics, and 
use it for the following tasks: 1) to assign partial grades for incorrect 
regular expressions and 2) to provide helpful feedback to students to 
make them understand the reason for the grades and a way to revise 
the incorrect regular expressions into correct ones. We categorize the 
cases when students’ incorrect submissions deserve partial grades and 
suggest how to assign appropriate grades for each of the cases. In order 
to optimize the runtime complexity of the algorithm, two heuristics based 
on automata theory are proposed and evaluated on the dataset collected 
from undergraduate students. In addition, we suggest Regex2NL which 
translates regular expressions to natural language descriptions to give in- 
sight to students so that they can understand how the regular expressions 
work. 


Keywords: regular expressions - MSO logic - automated grading system 
- automata theory 


1 Introduction 


Regular expressions (regexes) are a great tool for the pattern matching problem 
as they can effectively describe pattern structures. Regexes are widely used 
in software applications such as search engines, text processing, programming 
languages, and compilers due to their compact representations. Although most 
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developers find that regexes are powerful and flexible tools, they also feel that 
regexes are very difficult to learn for many reasons such as readability, validity, 
reliability, and so on [7,16]. 


There have been several interesting approaches to automatically grading 
student submissions in an automata-related course in the online education envi- 
ronment. Alur et al. [2] propose a technique for automatically grading students’ 
DFA construction in automata courses while generating high-level hints for help- 
ing students understand how to correct their wrong submissions. For instance, 
they introduce the DFA edit difference to compute the amount of difference 
between the correct DFA and students’ DFA and MOSEL (MSO-equivalent 
declarative logic) even to capture the case where the student’s submission cor- 
responds to a different logic in MOSEL. Later, D’Antoni et al. [6] utilize the 
DFA edit difference in order to generate natural language feedback explaining 
how to correct the submitted DFA. They also conduct an online survey to collect 
students’ feedback about the quality, usability, and effectiveness of their grading 
system. 


Kakkar [10] studies a similar problem, namely, the problem of grading regexes 
instead of DFAs. Inspired by the DFA edit difference [2], Kakkar proposes a 
new criterion called ‘Regex Edit Distance’ which is basically based on the string 
edit-distance between students’ regexes and correct ones. However, both works 
suffer from a limitation that ‘optimal’ answers for the problems should be given by 
TAs as they compare the students’ submissions with the answers for giving partial 
grades. Recently, D’ Antoni et al. [5] propose Automata Tutor v3 (abbreviated 
to AT v3 hereafter), which is the latest version of the previous work [2]. In 
AT v3, they include automated grading and feedback generation for a variety 
of new automata problems including the problems that ask to create regexes, 
context-free grammars, pushdown automata, and even Turing machines for a 
given description (e.g., a natural language description, or an automaton, or a 
grammar that belongs to a different class). However, they also rely on the string 
edit-distance for grading regexes similar to the work of [10]. Note that AT v3 
provides counterexamples of incorrect regexes such as strings that should (or not) 
be accepted by students as feedback. 


In this paper, we introduce an automated grading framework for regular ex- 
pressions that gives reasonable grades and helpful feedback. The overall structure 
of our regex grading scheme is illustrated in Fig. 1. As the regex construction 
problem’s goal is to make regex from the natural language description, TA first 
assigns the problem by giving the natural description of the problem and the 
logic formula of the regex which is one of the forms of the regular language. Then 
students submit the regex corresponding to the given description. Finally, we use 
three algorithms for generating more convincing partial grades and feedback by 
comparing the answer logic formula with the submission. 


We aim to overcome several remaining limitations that have not been resolved 
by the earlier approaches. First, we claim that it is not appropriate to grade 
a student’s regex just by calculating the string edit-distance with the ‘solution 
regex’. There could be infinitely many regexes that describe the same language. 
Even when we consider the set of most compact regexes describing the regular 
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Feedback: Star operator is missing. 
Grade: 7/10 for one edit 


Student's Regex Tree ————> Editted Regex Tree 


Syntactic 


Student A's Regex i ' Editted Regex 
Foun eh Grading — | | Student A's Regex (eq. with Ta's Logic) 
L abta La TEP = PAT 3 
Problem b+ ab*a (b + ab*a) 
( ) Even number of a’s () Feedback: Yours accepts "Even number of b's". 
Description —> —> Regex ——— Grade: 7/10 for one parameter edit (about a — b) 
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Feedback: Not accept strings consisting only of b's. 
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Student C's Regex Grading Enumeration Counter Examples 
(b*ab*ab*)* ——P | False g : Counter Example 


: Positive : ! i ! b's 


Logice ————— 
TA's Logic 
num_div(a, 2, 0) 


Fig. 1. Overview of our automated regex grading framework 


language in question, there can be multiple regexes since it is not guaranteed 
that there is a unique minimal regex for a given regular language. Also, the 
string edit-distance cannot take the structural similarity into account while we 
can obtain hierarchical information from the tree form of the regex. Second, 
we should consider not only the syntactic discrepancies but also the semantic 
discrepancies arising from the misinterpretation of the problem. In order to 
compare the logical differences in real-time, the regex must be transformed with 
the logic and converted to DFA in polynomial time. However, there is no compact 
logic to do so. Lastly, there is a lack of abundant feedback that helps students 
study regexes. More detailed feedback such as suggesting the shortest form of the 
regex, logical differences between the answer and the submission, and organized 
form of the corner case would be more helpful than simple symbol correction 
feedback. 


In order to resolve the above-mentioned issues, we propose a 3-step regex 
grading scheme that considers both syntactic and semantic discrepancies between 
submitted regexes and answer logic formulas (natural language descriptions). 
More specifically, first, to consider the syntactic discrepancy, instead of comparing 
a student’s regex with the solution regex, we compare the possible transforms 
of the student’s regex with the language of the solution. To this end, we apply 
tree-level edits to the parse tree of the regex to detect the possible syntactic 
mistakes made by the student. As shown in Fig. 1, after one tree-edit with adding 
the star operator to student A’s submission b+ ab*a, the edited regex is equivalent 
to TA’s logic (b + ab*a)*. Second, we take into account the possibility that a 
student simply misinterprets the specification of the language. For instance, we 
may consider that a submitted regex deserves a partial grade if the language 
expressed by the submission corresponds to a specification that is very similar 
to the given specification. Therefore, we consider the semantic discrepancy by 
applying logic-level edits to the logic formula for the specification and searching 
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for a similar specification that exactly corresponds to the student’s regex. In this 
way, by considering the ‘similarity’ to the student’s regex, we can give a partial 
grade. For example, after one logic-edit with changing the parameter from ‘a’ to 
‘V on the TA’s logic, edited logic num_ div(b, 2,0) is equivalent to the student 
B’s submission (a + ba*b)*. Finally, we take some corner cases into accounts such 
as when the language of a submitted regex misses a reasonably small portion of 
the target language such as the empty string or a language consisting of a single 
symbol (a* or b* when X = {a,b}). For instance, we can find that (b*ab*ab*)* 
cannot generate strings that have zero number of a’s and at least one b while 
it generates the empty string. Moreover, we generate productive feedback for 
students using the byproduct of each partial grading algorithm so that they can 
understand what is wrong with the current submission and how to correct the 
submission into a correct regex. 

The rest of the paper is organized as follows. Section 2 gives some definitions 
and notations. We introduce a set of declarative logic formulas for describing 
regular languages in Section 3 and our regex grading scheme in Section 4. The 
experimental results are provided in Section 5 and Section 6 concludes the paper. 


2 Preliminaries 


The size of a finite set S is denoted by |S|. Let X denote a finite alphabet 
and X* denote the set of all finite strings over X. For m € N, XS™ is the set 
of strings of length at most m over X. A language over X is a subset of X*. 
Given a set X, 2* denotes the power set of X. The symbol À denotes the empty 
string. We define mod(m,n) to be {k | k mod m = n,k € N}. We also define 
ind(w, x) = {k | wļ|k : k + |z|] = z,k € N}, where wii : j] for i < j denotes a 
substring of w concatenating characters of w from index i to j — 1, to be the set 
of indices where x appears in w. Note that the index starts from 1. 

A regular expression (regex) over X is a € X or the empty string À, or is 
obtained by applying the following rules finitely many times. For regexes Rı and 
R2, the union Rı + Re, the concatenation R; - R2, and the Kleene-star Rj are 
also regexes. 

Now we introduce a formal logic to be used to formally describe languages. 
Let w = wywW2-+: Wr, be a word over X. For any i € [1,n] and a symbol a € X, 
we say that a letter predicate a is true at i in w if w; = a. For example, the logic 
formula a(x) A dy(y > x A b(y)) means that ‘there is a symbol a at the position 
x and a symbol b at the position later than x’. It is readily seen that the formula 
describes the language described by the following regex: a(a + b)*b(a + b)*. It is 
well-known that regular languages are expressible in monadic second-order (MSO) 
logic [4]. 

Given a regex R, we define the parse tree T(R) to be the rooted tree represent- 
ing the hierarchical structure of R. Each leaf is labeled by a symbol in X U {A} 
and each internal node is labeled by n-ary operations such as - (concatenation) 
and + (union), or unary operation « (Kleene-star). We define the regex tree 
edit-distance ed,4(R, R’) of two regexes R and R’ to be the tree edit-distance 
between two parse trees of R and R’. Note that the tree edit-distance between 
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T(R) and T(R’) is defined as the minimum number of edit-operations required 
to transform the tree T(R) into T(R’), where an edit-operations for the regex 
tree edit-distance can be defined as a substitution of an operation symbol or 
a character from X into a different operation symbol (or a character from X), 
an insertion of a node, or a deletion of a node. It should be mentioned that we 
perform unordered matching between children of nodes labeled by the union + 
operator as the order of elements inside the union operator does not matter. 


3 Simple Declarative Logic for Regular Languages 


Since MSO logic formulas offer a relatively higher-level specification of regular 
languages than finite-state automata recognizing the languages, they can be used 
for describing regular languages in a human-readable format. Moreover, we can 
always compile an MSO logic formula for a regular language into a corresponding 
minimal DFA [12] and therefore, a regex as well. 

As the transformation from MSO to DFA may require the size of the alphabet 
to grow exponentially in the number of nested quantifiers [8], we restrict our 
attention to the logic formulas that can describe all regular languages considered 
in famous automata textbooks without covering the whole regular languages 
while being able to be converted into a corresponding DFA in polynomial time. 
Table 2 shows the list of declarative logic formulas considered in this paper. Recall 
that MOSEL [2], an extension of MSO logic with some syntactic sugar to allow 
describing regular languages more concisely, is introduced for a similar reason. 
However, we claim that our logic formulas directly correspond to NL descriptions 
at a much higher-level and allow us to perform language equivalence tests in 
practical runtime. 

Analogously to the parse tree of a regex, we define the parse tree T(@) for a 
given logic formula ¢. Here each leaf is labeled by an atomic formula and each 
internal node is labeled by unary logical connectives = (negation) or n-ary logical 
connectives such as A (conjunction) and V (disjunction). Similarly to the regex 
tree edit-distance, we also define the logic tree edit-distance edn (¢, d) of two logic 
formulas ¢ and db as the unordered tree edit-distance between two parse trees of 
@ and ĝ. Note that we allow the substitution of an atomic logic formula and two 
logical connectives, conjunction, and disjunction, for the logic tree edit-distance. 
We also allow the insertion and deletion of negation. The substitution of an atomic 
logic formula is available for a single parameter such as strings x,y, non-negative 
integers m,n, and a comparison operator O € {>, =, <}. While the edit cost of 
the substitution of a logical connective equals 1, we assign the string edit-distance 
for the substitution of a string parameter, the numerical difference for an integer, 
and the value 1 for the substitution of a comparison operator. 

We provide a list of regex problems and solutions collected from famous 
automata textbooks in Table 1. For each problem, we provide a natural language 
description for a regular language in question, a solution regular expression given 
in the textbook, and the corresponding logic formula found by us. We denote 
a+ À by a’ for brevity. 


No. Description 
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Table 1. A list of regex problems from famous automata textbooks. 


Solution Regex 


Logic Formula 


oRWNH 


26 


4 


Starts with a. ao* 
Ends with ab . a ab 
Contains the substring abab. o* ababo* 
Begins with b and ends with a. bo*a 


Length is at least 3 and the 3rd sym- aaao* 

bol is a. 

Length is a multiple of 3. (aao)* 

The number of a’s is divisible by 3 (b*ab*ab*ab*)*. 
Even number of a’s. (b + ab*a)* 
The 5th symbol from the right end o* bocca 

is b. 

a and b alternate. b’(ab)*a’ 

Each a is followed by at least one b. (a?b)* 

a"b™ where n > 3 and mis even aaaa*(bb)*. 


Contains less than three a’s. b*a’b* a’ b* 

Start with a and have odd length or a(oo)* + ba(aa)* 
start with b and have even length. 
Any strings except a and b. ((ao)o*)? 

Does not end with ab o*(aa+ba+ bb) +0". 
Contains at least one a and at most aa* + aa*ba + a*baa* 
one b. 

At least two occurrences of b be-b* + b*(abbb*)*ab* 
tween any two occurrences of a. 

Does not contain baa as a substring. a* (ba + b)* 

Every odd position is b. (b(ab)*o")? 

Has exactly one pair of consecutive (ab + b)*aa(ba + b)* 
a’s. 

Does not end with ba and the length o* (aa + ab + bb)* 

is at least two. 

Even number of a’s and each a is b*(abb* abb*)* 24 
followed by at least one b. 

Every pair of adjacent a’s appears (a + ba)*(b+ ab)*a’ 
before any pair of adjacent b’s. 


At most one pair of consecutive b’s. (a + ba)* (bb) (a + ab)* 


Regex Grading Algorithm 


pos(a, 1) 

pos_rev(ba, 1) 
num(abab, >, 0) 

pos(b, 1) A pos_ rev(a, 1) 
pos(a, 3) 


len_ div(a, 3,0) 
num _div(a, 3,0) 
num _div(a, 2,0) 
pos_rev(b, 5) 


num(aa, =, 0) Anum(bb, =, 0) 
allX_followedbyY (a, b) 
allX_beforeY (a, b) A 
num(a, >, 2)A 

num _div(b, 2,0) 

num(a, <, 3) 

(pos(a, 1) Alen_div(2,1)) V 
(pos(b, 1) A len_ div(2,0)) 
asingle _word(a) A 
ssingle word(b) 
=pos_rev(ba, 1) 

num(a, >,0) A num(b, <, 2) 


exists _between(b, a, 2) 
num(baa, =, 0) 

pos_every(b, 2, 1) 

num(aa, =, 1) 

apos_rev(ab, 1) A len(1, >) 
num _div(a, 2,0) A 
allX_followedbyY (a, b) 

allX _beforeY (aa, bb) 


num(bb, <, 2) 


In this section, we explain our automated regex grading algorithm by considering 
both syntactic and semantic properties. 


4.1 Grading of Regexes 


Let us assume that exact logic formulas for regular languages asked in questions 
are already known as teachers always can specify the regular languages with 
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Table 2. A list of declarative logic formulas used to describe regular languages that 
appear in famous automata textbooks, where m,n € N, a,b E€ X, x,y E€ X* , and 


Logic Formula 


€ {>,=, <}. In the set notation, we broadcast +n and —n for some integer n to each 
element of the given set. 


Description / Set Notation 


single _word(x) 
pos(a, n) 


pos_rev(x,n) 


len(O, n) 
len_div(m,n) 
pos_every(x, m,n) 


num(z, 0O, n) 


num_div(x, m,n) 


Accepts a string x. / {x} 

Substring x starts at nth position. / 

{wav | [w| =n—-—1Aw,veE X*} 

Substring x starts at nth position in reverse order. / 
{wav | [v| =n-—1lAw,veE L*} 

Strings of length On. / {x | |x| On} 

Strings of length € mod(m,n). / {x | |x| E mod(m,n)} 
Substring x appears at every mod(m, n)th position. / 


{w 


ind(w, x) = mod(m,n) N [1, |w|]} 


Contains x as a substring On times. / 


{w 


jind(w, x)| On} 


Contains x as a substring mod(m, n) times. / 


{w 


jind(w, x)| € mod(m, n)} 


allX_followedbyY(a,y) Every substring x is followed by y. / 


{w 


ind(w, x) + |x| C ind(w, y)} 


allX_ followingY (zx, y) Every substring zx is following y. / 


{w 


ind(w, x) — |y| C ind(w, y)} 


allX_beforeY (x, y, 7) Every substring x appears before any occurrence of y. / 


exists __ between(b, a, n) 


consecutive(a, O, n) 


{w 


{w 


max(ind(w, x)) < min(ind(w, y))} 


b appears n times between every adjacent pair of a’s. / 


Vi, j € ind(w, a) s.t. |ind(w, a) N [i, j]| = 2, 
lind(w, b) A [é, j]| = n} 


Every a appears On times consecutively. / 


{w 


Vi € ind(w,a) s.t. wļi — 1] # a and 
j = argmax{wli : j] € a*}, (j — i) On} 
3 


consecutive div(a,m,n) Every a appears mod(m,n) times consecutively. / 


{w 


Vi € ind(w,a) s.t. wļi — 1] # a and 
j = arg max{wii: j] € a*}, (j — i) € mod(m, n)} 
j 


Table 3. Examples of incorrect regexes for ‘Even number of a’s’, which has a possible 


solution (b + ab*a)*. 


Error Type 


Regex 


Error Analysis 


Syntactic error b + ab*a 


Logical error 


(a + ba*b) 


Star operator is missing. 
* Accepts “Even number of b’s”. 


Semantic error (b*ab*ab*)* Does not accept strings consisting only of b’s. 


the provided logic formulas in Table 2. We aim at grading the submitted regex 
in terms of two types of syntactic correctness and a set of counterexamples as 


follows: 
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Syntactic grading Recall that previous approaches to computing the syntactic 
similarity or dissimilarity between two regexes rely on string edit-distance between 
two regexes. However, the string edit-distance between two regexes does not 
take the structural similarity into account. We instead use the tree edit-distance 
between two parse trees of regexes as the tree edit-distance better reflects the 
structural similarity of regexes. One of the advantages of using the tree edit- 
distance is that we can also easily identify semantically equivalent regexes when 
they are viewed as parse trees rather than as strings. 

Then, we define the syntactic grade of R based on the minimum tree edit- 
distance between R and an unknown regex R such that L(R) = L(¢). Formally 
speaking, the syntactic grade of R is defined as follows: 


Gsyn = Ga — Wsyn(R) - minfedy(R, R) | L(R) = L(9)}, (1) 


where Gu means the full grade (10 in our implementation). The function weyn 
scales the deduct points based on the length of the submitted regex R because 
if R is very long and it requires a single edit, then we may consider that R is 
syntactically similar enough to a solution. 

Let us explain the detailed procedure for computing G,,. We first parse 
the regex R as a binary tree and construct the set Spn = {È | edy(R, R) < n} 
of regexes where each regex is within the tree edit-distance n (n = 2 in our 
experiments). Note that we use tree edit-distance instead of string edit-distance 
used in AT v3 and RegED as the tree edit-distance makes more sense to compute 
the syntactic difference between two regexes. For instance, the tree edit-distance 
between a + b and (b + a)* is one while the string edit-distance is five. 

For running the above procedure more efficiently, we increment the value of n 
from zero by one at each iteration until we find such R. We also check whether or 
not the current regex is already examined in the previous iteration by comparing 
the parse trees of regexes so that our implementation can avoid redundant regex 
equivalence tests. 


Logical grading Given a problem ‘A regex for strings where the string aba appears 
at 3th position.’, a student may submit an incorrect solution (a + b)aba(a + b)* 
by making a mistake of reading the number ‘3’ as ‘2’. Because the most plausible 
answer is (a+b)(a+b)aba(a+b)*, the student’s submission is likely to receive no 
partial grade according to the syntactic grading, which could be a harsh decision 
for an elementary mistake. However, if we semantically compare the submission 
and the problem, there is a hope to receive a partial grade as they turn out to be 
very similar in terms of corresponding logic formulas pos(aba, 2) and pos(aba, 3). 

The main challenge in logical grading is to find a logic formula that corresponds 
to the submitted regex such that we can effectively quantify the amount of 
semantic discrepancy between the submitted regex and the problem. Given a 
regex, it requires a considerable amount of computation for finding a logic formula 
described as a logical combination of formulas provided in Table 2, assuming 
that the only feasible approach is an exhaustive tree search. Even worse, it is 
not always possible to find such a corresponding logic as the provided set of 
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logic formulas cannot cover the entire class of regular languages. In order to 
save computation time, we instead utilize the solution logic formula by applying 
tree-level edits to the parse tree of the solution logic formula at most n times 
(again, n = 2 in our implementation) and checking whether the edited formula is 
language-equivalent to the submitted regex. 

If we manage to find a logic formula $ that corresponds to the submitted 
regex, then the logical grade of R is then computed as follows: 


Giog = Gia — Wiog(¢) - min{edi,(¢, $) | L(¢) = L(R)}. (2) 


Corner case grading In some cases, the submitted regex may describe a very 
similar language to the language in question although the regex is syntactically 
different (e.g., tree edit-distance is larger than n). For instance, let us consider 
a problem with the following description: “Strings with even number of a’s.” 
provided in Table 3. The language described by a regex (b*ab*ab*)* is quite 
similar to the described language except for strings only with b’s. In order to 
check whether the submitted regex deserves a corner case partial grade, we 
construct two DFAs for the following languages: L(R)  L(¢) and L(R)N L(¢). 
The language L(R) N L(¢) is the set of strings that can be described by R and 
not by ¢ (false positive examples). On the contrary, L(R) M L(¢) captures the 
set of strings that are described by ¢ but not by R (false negative examples). 
We enumerate the strings from both DFAs by using the enumDFA function in 
FAdo library in lexicographical order and display them to users to make them 
understand why their submissions are not correct by counterexamples. 

We also assign a corner case grade Geor = 4 x Gun if false positive and false 
negative sets satisfy one of the following conditions:: 


1. There is only e€ in either false positive or negative set. 
2. There are only less than m false positive and negative strings. 
3. L(R) U {a*} = L(¢) or L(R) U {b*} = L(¢). 


4.2 State Complexity of Logic Formula’s DFAs 


It is easy to see that all atomic logic formulas presented in Table 2 can be 
represented by DFAs of size linear in the lengths of string parameters. In the 
following proof, m,n € N, a,b € X, x,y € X* , and O € {>,=,<}. 


Proposition 1. For each atomic logic formula ¢ in Table 2, we can construct a 
DFA recognizing L(@) with a polynomial number of states in |x| and |y]. 


While most of the formulas in Table 2 can be represented as DFAs of size 
linear in the numerical parameters m and n as well, there are two exceptions: 
‘pos_rev(#,n)’ and ‘pos_every_rev(x,m,n)’. 


Proposition 2. For each atomic logic formula ¢ in Table 2 except pos_ rev(a,n) 
and pos_every_rev(x,m,n), we can construct a DFA recognizing L(¢) with a 
polynomial number of states in m and n. 
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a,b 


a,b 
OOOO 
qo qı q2 


Fig. 2. An NFA for pos_rev(a,n). 


Unlike the other formulas, the state complexity of pos_rev(x,n) and 
pos_every_rev(x, m,n) is exponential in n in the worst case. 


Lemma 1. The state complexity of pos_rev(x,n) is exponential in n. 


Proof. Since the NFA construction for pos_rev(x, n) requires |x| +n + 1 states, 
we have a simple upper bound 2!*!+”"+! which is exponential in n for the state 
complexity of pos_rev(z, n). 

The simplest example where the lower bound is also exponential in n is when 
x is a string of length one such as a or b. See Fig. 2 for an NFA accepting 
the regular language pos_reverse(a,n). Since the initial state qo has a self-loop 
labeled by X, it is easy to see that the upper bound of the state complexity is 2” 
as qo is always in the state set in the subset construction. 

Now we will show that the upper bound 2” can be reached by describing how 
we can reach any subset of states from 2¢41-4*-4+1}, Let us consider a state set 
P = {qs1; Megs ++ 4a, |, Where s; < sj for 1 <1 <9 < k <n +1. Then, we can 
reach P by reading the following string: 


ab** Sk-1 1a psk-1 8k-—2 1... gp! 


Since it is easy to see that all states in 2191:4927- »47+1} are pairwise distinguishable, 
we conclude that the state complexity of pos_rev(a,n) is 2”. 


Now the following state complexity is obvious from the above observation. 
Proposition 3. The state complexity of pos_every_rev(x,m,n) is exponential 
inn. 

4.3 Heuristics for Faster Computation 


In order to avoid this exponential blow-up in the size of DFAs, we employ the 
following two heuristics for faster computation of grades. 


Regex reverse trick Interestingly, we can avoid this exponential blow-up caused 
by pos_rev(a,n) by reversing the given regex and the logic formula at the 
same time. We can trivially reverse the regex while maintaining the length and 
construct polynomial-sized DFAs for all reversed logic formulas except pos(z, n). 
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For instance, suppose that we are given a regex R and a declarative logic formula @ 
as follows: 


R = a(a + b)b*b and 
$ = pos_rev(b, n) A len(>,3) A num(a, >, 1). 


In order to avoid the exponential blow-up by pos_rev(x,n), we reverse R 
and @ as follows: 


R' = bb*(a + bja and 
o = pos(b, n) A len(>,3) A num(a, >, 1). 


Note that the logic such as len(O, n) and len(x,O,) are reversal-invariant. 


Concise Normal Form Recall that we construct a set of regexes from a submitted 
regex R by applying parse tree level edits for computing the syntactic grade. 
The main computational bottleneck comes from the repetitive regex equivalence 
tests as there are too many regexes in the set. In order to reduce the size of 
the constructed set, we employ the concise normal form [11] of regexes which 
are proven to be useful to sufficiently reduce the number of redundant regexes. 
For instance, we inductively apply substitution rules for subregexes such as 
R*R > RR*, R*R* => R*, R+ R* > R*, (R*)* > R* for concise regex 
representation and pruning of redundant regexes. 


4.4 Description of Regex Grading Algorithm 


Algorithm 1 precisely describes the whole procedure for computing the final 
grade of a student’s regex R for a problem corresponding to a declarative logic 
formula ¢. First, we preprocess the given student’s regex R and declarative logic 
formula using the normal form and reverse trick for faster computation and 
convert them into the DFAs for partial grading. If the submission is equivalent to 
the solution, then give 10 points. If not, give the highest point among the three 
partial grades. 


4.5 Converting Regex to NL Description 


Many researchers have studied the problem of translating an NL description into 
a corresponding regex [13,15,17]. Here we examine a dual problem, namely, the 
problem of converting a regex into an NL description (Regex2NL) to help regex 
learners easily understand the language accepted by the given regex. Consider 
(b + ab*a)* for an example again. Instead of merely translating the semantics of 
regex operators and symbols, our goal is to generate an ‘easy-to-understand’ NL 
description such as ‘even number of a’s’ which corresponds to a logic formula 
defined in Table 1. 

Our approach involves two steps, where we first find a logic formula corre- 
sponding to the regex and then translate the logic formula into an NL description 
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Algorithm 1: Our Regex Grading Algorithm 
Input :A student’s regex R and a declarative logic formula ¢ 
Output: A grade, feedback of R for the problem specified by ¢, and a set of 
counter-examples 
Convert R into R’ which is in a regex normal form; 
if ọ contains pos_reverse(x,n) and not pos(x,n) then 
Reverse R’ and ¢; 
Construct a DFA Aw for R’ and a DFA Ag for ¢; 
if L(Ap’) = L(Ag) then 
if |R'| < |R| then 
| return 10, ‘R can be written in more compact form such as R’’, Ø; 
else 
| return 10, ‘Well constructed’, Ø 
else 
Compute (Gsyn, R) and (Giog, $) of R; 
Generate a set S of random strings from L(¢) N L(R)°; 
if Gsyn > Giog then 
| return Gyn, ‘R should include ... to be the R’, S; 
else 
| return Giog, ‘R accepts a language specified by $’, S; 


by rules. It is worth noting again that there are regexes that cannot be effectively 
described by our logic. Therefore, it is not always possible to find a corresponding 
logic from a given regex even if we enumerate all logic formulas. Even if there 
exists a corresponding logic for the given regex, it takes too much time (more 
than one minute in general) for practical use in most cases. Hence we propose to 
use a deep learning-based approach that can predict a logic formula from a given 
regex with reasonably high accuracy in practical runtime (less than one second). 


First, we train the Regex2Logic model that translates a regex to a logic formula 
using a sequence-to-sequence neural network with attention mechanism [3]. For 
training our Regex2Logic model, we use a dataset consisting of 13,437 pairs of 
regexes and logic formulas that are collected by time-consuming enumerations of 
regexes and logic formulas, and regex templates. We construct the regex-logic 
pair dataset for training our Regex2NL model which translates a given regex into 
a logic formula defined by using our simple declarative logic formulas. We collect 
the pairs by time-consuming enumerations of regexes and logic formulas and 
regex templates. We split the pairs into the ratio of 8:1:1 for training, validation, 
and test sets. We explain each process in more detail as follows: 


1. Regex enumeration: enumerate regexes from the simplest one to more complex 
ones by increasing the depth of parse trees of regexes and searching for 
corresponding logic formulas until pre-defined thresholds (two for the depth, 
three for the length of argument strings and integers) for the complexity of 
logic formulas are reached. 
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Table 4. Statistics of the constructed regex-logic pair dataset used to train our Regex2NL 
model. ¢,¢1, and ¢2 denote atomic logic formulas found by enumerations of regexes 
and logic formulas or regex templates. 


Logic Formula # Examples 
oi A Q2 1,202 
$1 V 2 1,939 
a 463 
single _word(z) 88 
pos(x, n) 3,854 
pos_rev(x,n) 3,824 
len(O, n) 73 
len_div(m,n) 20 
pos_every(x,m, 7) 0 
num(z, O, n) 954 
num_div(x, m,n) 8 
allX_followedbyY (z, y) 699 
allX_followingY (x, y) 0 
allX_beforeY (x,y, 7) 184 
exists __ between(a, b, n) 0 
consecutive(a, O, n) 59 
consecutive_div(a, m, n) 70 


Total Number of Formulas 13,437 


2. Logic formula enumeration: enumerate atomic logic formulas by varying the 
arguments such as strings of length up to n and integers from 1 to n and find 
a corresponding regex by exhaustively enumerating regexes. 

3. Regex template: use regex templates for which we can easily match corre- 
sponding logic formulas. For instance, regexes with no operator such as aba 
correspond to the logic single _word (aba). 


Table 4 shows the statistics of our dataset, especially in terms of the distri- 
bution of logic formulas used. The conjunction or disjunction of the same logic 
formulas is counted as a conjunction or disjunction. 

In order to construct a set of regex-logic pairs, we can manually define a regex 
in a generalized form for each logic formula with arbitrary arguments. We rely on 
the following list of regex templates for generating various regexes by changing 
arguments of the templates: 


— pos(x, n) : oY zo* 


— pos_rev(z,n) : ot ako) 

— len(=,n) : o” 

— len(<,n) : (o +A)" 

— len(<,n) : o? +0? +0? +... +0"! 
— len(>,n) : 0"t1o* 

— len_div(x, m,n) : o” (o™)* 

— len_div(z, m,n) : (o™)*o” 
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By applying enumerated strings and integers as arguments, we can collect 
many regex-logic pairs. Once we discover the initial set of regex-logic pairs, we 
augment the data by combining the regexes and logic formulas with a regex 
operator + and a logical connective V, respectively. 

Note that our Regex2NL achieves about 92.3% prediction accuracy for the 
test set. For 167 incorrect regex submissions from students, our logical grading 
module finds 21 logic formulas that are within logic tree edit-distance two from 
the solution logic formula. Among the remaining 146 regexes, our model predicts 
39 logic formulas that actually correspond to given regexes. We can provide 
natural language descriptions for 35.9% of the incorrect submissions from the 
logical grading module and the Regex2Logic model. We believe it is very useful to 
provide ‘easy-to-understand’ NL descriptions on 35.9% of submissions using our 
Regex2NL model, while most regexes do not have corresponding logic formulas 
definable by the proposed set of simple declarative logic formulas as we already 
discussed. 

Then, we can transform the logic formula given by Regex2Logic to the natural 
language description with the heuristic template. We can make a template easily, 
as the logic formula has the characteristic of the natural language. We can use 
the entire framework of Regex2NL not only for feedback on incorrect submissions 
but also for making the random regex problem. For example, we can make the 
random regex first with regex enumeration of the regex template, then we can 
translate the regex to the natural language description. We can make the pair of 
regex-NL for using the regex problem. 


4.6 Feedback Generation 


There are natural types of feedback such as binary feedback (correct /wrong), 
an example, and a natural language-based conceptual hint. Binary feedback 
is the simplest yet necessary feedback that should be provided to students 
who submitted regexes. We can also simply generate a counterexample if the 
submitted regex is not correct. We focus on generating a natural language-based 
conceptual hint that describes the discrepancy between the desired solution and 
the submitted solution in an easily understandable manner. 

When the submitted regex is not correct, there can be two cases as follows. 
First, the submitted regex should be slightly revised in order to accept the desired 
language. In this case, the most desirable feedback may be the way to revise the 
submitted regex. Second, the submitted regex accepts a semantically different 
language than the desired language as the student may have misinterpreted 
the question. Then, we may need to inform the student about the semantic 
discrepancy between the language described by the submitted regex and the 
desired regular language in an easily understandable manner. 

For the first case, we provide the regex edit sequence between the submitted 
regex R and a regex R’ which is syntactically closest (with the smallest regex 
edit-distance) to R while accepting the regular language specified in the problem. 
For the second case, we suggest the logic edit sequence between the logic formula ¢ 
corresponding to R and a logic formula @ specified in the problem. If the problem 
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asks a regular language “strings containing a substring abab at least once” which 
corresponds to num(abab,>,0) and the submitted regex captures a regular 
language corresponding to num(ab, =, 0), then we provide the following feedback: 
“Consider substring abab instead of ab and operator > instead of =.” 


4.7 Converting Logic Formulas to NL Descriptions 


Table 5 shows the NL descriptions for each atomic logic formula used in the rule- 
based translation of logic formulas into NL descriptions. When a logic formula 
is formed by combining more than two atomic formulas ¢; and ¢2 using logical 
connectives, we simply combine the corresponding NL descriptions. For example, 
let NL(¢) be the NL description of an atomic logic formula ¢ following the rules 
in Table 5. Then, NL(¢1 A ¢2) is defined as ‘The set of strings that satisfy the 
following conditions: ‘NL(¢ 1)’ and ‘NL(¢2)’. 

Using this, we present regexes in more concise form even when the submitted 
regex is correct. Let us consider the problem ‘all runs of a’s have lengths that are 
multiples of three’. Note that a regex (aaa + b)* can be a solution. If a student 
submits (aaa + b*)* + b* as a solution, then the system should give the full 
grade since the submitted regex recognizes the desired regular language. While 
assigning a full grade to the submission, our algorithm provides (aaa + b)* to 
the student by computing the concise normal form [11] of the submission so that 
the student can recognize that there is a better solution (in terms of syntactic 
conciseness). 


5 Experiments 


We recruited 20 undergraduate students who were taking or had taken an 
automata course at the time of conducting our research, and ran our automatic 
grading algorithm on students’ regex submissions for ten selected exercises 
from famous automata textbooks [9,14,18]. In order to compare the results of 
automated grading with the previous approaches including RegED [10] and AT 
v3 [5], we implemented the algorithms in Python 3 on our own and used them for 
comparison. We cannot use the existing implementations directly, because they 
do not support a feature of adjusting the maximum number of allowed edits, and 
not all of them are supported as a tool. We utilized the Python 3 port* of the 
FAdo [1] package, which is an open-source library for the symbolic manipulation 
of automata and other computation models. We also restricted the number of 
edits allowed for partial grades to two in our algorithm and AT v3, and one in 
RegED since RegED applies edits from both solutions and submissions. 


5.1 Main Results 


Table 6 shows the experimental results in terms of the statistics of grading results. 
We present the ratio of submissions that received partial grades by the considered 


f https: //github.com/Oxnurl/fado-python3 
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Table 5. Natural language descriptions of our declarative logic formulas. 


Logic Formula Description 

single word(z) only a single string x 

pos(x, n) strings that have substring x at nth position 

pos_rev(x,n) strings that have substring x at nth position in reverse 
order 

len(=, n) strings of length n 

len(<, n) strings shorter than n 

len(>, 7) strings longer than n 

len_ div(2,0) strings of even-length 

len_ div(2, 1) strings of odd-length 

len_div(m,n) strings that have a remainder of n when it’s length is 


divided by m 

pos_every(zx, 2,0) strings in which character x appears every even-position 

pos_ every(z, 2,1) strings in which character x appears every odd-position 

pos_every(x, m,n) strings in which substring x appears every kth position 
such that k mod m =n 

pos_every_rev(x,2,0) strings in which character x appears every even-position 
in reverse order 

pos_every_rev(a#,2,1) strings in which character x appears every odd-position 
in reverse order 

pos_every_rev(z,m,n) strings in which substring x appears every kth position 
in reverse order such that k mod m=n 


num(z, =, n) strings that contain x as a substring n times 

num(z, <, n) strings that contain x as a substring less than n times 
num(z, >, n) strings that contain x as a substring more than n times 
num_div(z, 2,0) strings that contain an even number of 2’s 
num_div(x,2, 1) strings that contain an odd number of x’s 

num_div(x, m,n) strings that contain x’s such that the number of its ap- 


pearance modulo m is n 
allX_followedbyY(xz,y) strings in which every substring x is followed by y 
allX_followingY (x,y) strings in which every substring x follows y 
allX_beforeY (x, y) strings in which every substring x appears before y 
exists_ between(xz,y,n) strings in which substring x appears n times between 
every pair of y 


consecutive(x, =, n) strings in which every x appears n times consecutively 

consecutive(x, <, n) strings in which every x appears less than n times con- 
secutively 

consecutive(x, >, n) strings in which every x appears more than n times con- 
secutively 


consecutive _div(x,2,0) strings in which every consecutive x’s have even-length 

consecutive _div(x,2,1) strings in which every consecutive x’s have odd-length 

consecutive _div(x,m,n) strings in which every consecutive x’s have a length such 
that when the length is divided by m, the remainder is n 


grading algorithms in ‘Partial Total’ column. The ‘Partial Gsyn’ column shows 
the ratio of regexes that received a partial ‘syntactic grade’ by AT v3, RegEd, and 
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Table 6. Performance comparisons of the proposed grading algorithm with baseline 
algorithms proposed in previous works [5,10]. 


Algorithm Partial Gsyn Partial Giog Partial Total 
AT v3 [5] 30.2% 7.0% 30.2% 
RegED [10] 45.3% 9.3% 45.3% 
Syntactic grading (Ours) 37.2% 8.7% 37.2% 
Logical grading (Ours) 10.5% 12.2% 12.2% 
Corner case grading (Ours) 6.4% 0.0% 6.4% 
Our algorithm 39.0% 12.2% 40.7% 


our syntactic grading algorithms over all regexes. Since AT v3 and RegED only 
consider syntactic grading, values in this column show the ratio of regexes that 
received partial grades over all regexes. On the other hand, ‘Partial Giog’ column 
shows the ratio of regexes that received a partial ‘logical grade’ by our algorithm 
over all regexes. It is seen that AT v3 and RegED fail to assign partial grades to 
some regexes as they only consider syntactic differences with solution regexes, 
not the logic formulas behind the problem descriptions. Note that higher partial 
grades do not always mean that the grades are ‘well-deserved’. It is important 
whether the partial grade is convincing. We will explain in the following section 
why RegED gives more partial grades than ours and why giving more partial 
grades cannot be a good choice. 

To put it briefly, RegED gives partial grades to more regexes (45.3%) than AT 
v3 (30.2%) and even ours (40.7%). Table 7 shows several examples of the grades 
and feedback examples for students’ submissions to the five problems in Table 1. 


5.2 Validity of Grading Results 


In order to verify that our algorithm indeed assigns partial grades to submissions 
that are ‘well-deserved’, we provide several reasons. 

First, we can find logical partial grades while AT v3 and RegED cannot. 
We demonstrate two examples for the case. For the problem with the following 
description ‘even number of a’s’, our algorithm assigns a partial grade to the 
submission (a + ba*b)* while there is a possible solution (b + ab*a)*. Our logical 
grading module gives a partial grade, as it is possible that the student makes a 
simple mistake of confusing a with b. For the problem ‘contains at most three a’s’, 
our algorithm assigns a partial grade to b*(a + A)b*(a + A)b*(a + A)b* (a + A)b* 
while one of the possible solutions is b* (a + A)b*(a + A)b*(a + A)b*. This is again 
possible due to our logical grading module, as the student could have confused 
numbers. 

Second, our syntactic grading gives some partial grades with tree-edit while 
others cannot. For example, our syntactic grading gives a partial grade to 
(b*a*)abab(b*a*) for the problem ‘contains the substring abab’ as we may in- 
sert two star operators for the occurrences of (b*a*). However, RegED and AT 
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Table 7. Grading and feedback examples generated by our regex grading algorithm for 
problems in Table 1. We denote a + b by a for brevity. 


No. Student’s Regex Grade Feedback Example 


3 o*(abab)*o* 7 Remove the star operator to convert it into 
o*ababo*. 

3 b*(abab)*a* 0 Should accept {aabab, ababb, aaabab, aababa}. 

4 ao*b 3 Strings should begin with b instead of a 


(pos). Strings should end with a instead of b 
(pos_ reverse). 


4 bo*a* 6 Insert a to convert it into bo*a*a. 

8 (b*ab*ab*)* 6 Include strings of b* by inserting a union operator 
and b. 

8 (a+ ba*b)* 6 Strings should contain an even number of a’s 
instead of b’s (num_ div). 

11 ((b+A)a)* 3 Each a instead of b should be followed by b instead 
of a (allX_followedbyY). 

11 (ab)* 3 Insert a union operator and A to convert it into 
((a + A)b)*. 

26 (a+ ba)*bbb* (a + ab)* 0 Should not accept {bbb, bbbb, abbba}. 

26 (a+ba)*(b+A)(a+ab)* 6 Insert a concatenation operator with b to convert 


it into (a + ba)*(bb + A)(a + ab)*. 


v3 will not assign a partial grade if they are provided (a + b)*abab(a + b)* and 
(b + a)*abab(b + a)* as possible solutions while our algorithm uses logic as a 
solution. This is because RegED utilizes only one solution regex for comparing 
with the submitted regex and it allows edits from both the solution and the 
submitted regex. RegED performs an edit at solution regex and submitted regex, 
respectively, to improve speed, but if solution regex is not given in an ideal form 
as in the above example, RegED cannot grade properly. To solve this problem, all 
possible variants of solution regex must be considered for editing and comparing 
and this leads to significant time-consuming. We can compare with every possible 
candidate without additional time, as our regex grading uses logic for the solution 
and permits the edit only in submission regex. 


Third, the string edit used by RegED tends to cover too many candidates 
rather than our tree edit. For instance, it can change a+b+c to a*b+c and aab+c 
with a single edit. This may differ depending on the TA’s point of view, but we 
believe that the edit should be conducted more strictly due to the perspective 
of the tree structure, the original property of regex. Since given edits are more 
fluid than the tree edit, it allows more areas to be covered by edit, which is not 
considered the intended edit, suggesting that giving a lot of partial grading is 
not always the right direction. Assigning higher partial grades is not always the 
right direction, as it often jumps ahead of what we intended. 
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Table 8. Evaluation for the similarity with TA partial grades. 


Algorithm Precision Recall F1 score 
AT v3 [5] 60.3% 50.8% 54.8% 
RegED [10] 57.6% 71.1% 63.2% 
Syntactic grading (Ours) 60.4% 53.38% 56.3% 
Logical grading (Ours) 73.3% 25.8% 37.9% 


Corner case grading (Ours) 60.0% 10.2% 17.3% 
Our algorithm 62.2% 65.6% 63.3% 


5.3 Comparison with TA Partial Grade 


Table 8 demonstrates how the grading results by the algorithms align well with 
the human TAs’ grading results. We ask five human TAs to give grades to 167 
incorrect regex submissions by students. First, we calculate the precision, recall, 
and F1 score for each algorithm and for each TA. Precision is the percent of 
partial grades by the algorithm that matches the TA and recall is the percent 
of TA partial grades that the algorithm agrees with. Then we get an average 
score comparing the grading results with each result of human TAs. Since 
correct submissions should always receive full marks, we only consider incorrect 
submissions and check whether or not human TAs gave partial grades to the 
submissions. In other words, we assume that human TAs always make the right 
decisions in terms of giving partial grades to incorrect submissions and consider 
the cases where the partial grades are given as positive cases. We can see that 
the results in the ‘Precision’ column imply how the algorithms ‘carefully’ select 
submissions that deserve partial grades and the ‘Recall’ column show that the 
algorithms do not miss such cases. 


Overall, our grading algorithm shows the best performance in terms of the 
F1 score, which is the harmonic mean of precision and recall. Then, RegED is 
places in the second position with a tiny gap between our algorithm and AT v3 
following it. 


Intuitively, it is natural that the recall is highest in RegED as RegED covers 
more regexes than the other compared algorithms. We can also see from the 
high precision of the logical grading module that the partial grade submissions 
captured by the logical grading module are quite precise even compared with 
the other modules used in our algorithm. However, the logical grading fails to 
capture the regexes that received partial grades by TAs from the other algorithms. 
On the other hand, the syntactic grading can capture much more regexes that 
received partial grades by TAs than the other modules in our algorithm. This also 
shows that human TAs tend to give partial grades to submissions with syntactic 
mistakes rather than to submissions with logical mistakes. 
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Fig. 3. Runtime comparison w/wo reverse trick. Sn and cn indicate problems correspond- 
ing to logic formulas pos_rev(a,n) and pos_rev(a,n) ^A num(bba, >, 0), respectively. 


5.4 Effectiveness of the Regex Reverse Trick 


We demonstrate the effectiveness of the reverse trick in terms of runtime com- 
plexity reduction of the proposed algorithm in Fig. 3. There is no noticeable 
difference in short regexes. However, we can find that the time increases to log 
scale as the length of the regex increases. 


5.5 User Study 


3 


In Fig. 4, we provide a screenshot of a web page for the online ‘Regex Trainer 
in which our regex grading algorithm is employed. In the online Regex Trainer 
page, the system displays each regex construction problem in turn to a student. 
If the student inputs his/her answer for the problem, then the system shows the 
grade with feedback and displays the next problem. 

We conducted a user study by asking five questions to nine students who 
performed tests on the usability and usefulness of our regex grading algorithm. 
The result is shown in Table 9. Each student is asked to give their answer to 
each question on a Likert scale from 1 (strongly disagree) to 5 (strongly agree). 
The result shows that average scores for the five questions are all in the range 
of [3.7, 4.4], which implies that the students in general find our grading system 
easy-to-use and useful for studying regexes. 


5.6 Limitations 


In the following, we leave a list of limitations of our study. First, the proposed set 
of logic formulas cannot express the entire class of regular languages. In future 
work, we may extend the set of formulas by adding useful logic formulas that 


110 S.-H. Kim et al. 


Welcome to Regex Trainer 


Our regex syntax supports the following operations: ™ (Kleene-star), '+' (union), '‘@epsilon' ('b?' -> 'b+@epsilon’), and sigma. The alphabet is f'a', 'b’] 
Solution syntactic grade: the regex (tree) edit-distance between the submitted regex and the solution regex (one closest to the submitted regex) 


Problem syntactic grade: the logic edit-distance between the problem logic description and the logic described by submitted regex 
Semantic grade: the ratio of false positive + negative examples 


Problem 1: Begin with 'b’ and end with ‘a’. 

Me: b(a+b)a 

Response 

Solution syntactic grade: 6 (number of edits:1) 

Feedback: You may edit your regex as follows: [Add star operator] 

Then, the edited regex will be b(a|b)*a 

Problem syntactic grade: 0 (number of edits:None) 

Semantic grade: 0 

Your regex should accept the following examples: ['baaa’, 'baba’, 'bbaa’, ‘bbba’, 'baaaa’, 'baaba’, 'babaa’, 'babba’, ‘bbaaa’, 'bbaba’] 


Final grade: 6 


Fig. 4. A screenshot taken from the web page of online ‘Regex Trainer’ where our 
automatic grading module is used inside. 


Table 9. Student survey result. Nine students gave their judgments for the following 
five questions on a Likert scale from 1 to 5. 


Question Score 
The grading module is easy to use. 4.4 
I agree with the given partial grade. 3.8 
Feedback for each partial grade is helpful and instructive. 3.9 
Feedback is not misleading. 3.7 


Feedback and NL description improved my understanding of the regex. 3.9 


are suitable for potential regex construction problems. Second, there could be 
another approach to catch student’s ‘mistakes’. We suggest three partial grades 
that catch syntactic, logical, and corner case mistakes. Finding a new cause of 
mistakes can provide richer and more detailed feedback for students. Moreover, 
it is very likely that our grading algorithm takes too much time if the submitted 
regex is unnecessarily long since in this case the number of regexes that should 
be examined would increase exponentially. 


6 Conclusions 


Due to the transition from face-to-face teaching to online, distance learning, the 
importance of developing an automated grading system has become more evident. 
We have presented an efficient and powerful automated grading algorithm for 
regexes in undergraduate automata and formal language courses. Our algorithm 
takes students’ regex submissions and assigns appropriate grades with productive 
feedback to the regexes by considering the syntactic and semantic alignment 
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between the submitted regexes and the problem definition. Moreover, by employ- 
ing several heuristics such as the reverse trick and intermediate regex simplifi- 
cation, we could have reduced the runtime complexity for the repetitive regex 
equivalence tests for grading regexes. 
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Abstract. State of the art optimisation passes for dependently typed 
languages can help erase the redundant information typical of invariant- 
rich data structures and programs. These automated processes do not 
dramatically change the structure of the data, even though more efficient 
representations could be available. 

Using Quantitative Type Theory as implemented in Idris 2, we demon- 
strate how to define an invariant-rich, typechecking-time data structure 
packing an efficient runtime representation together with runtime irrele- 
vant invariants. The compiler can then aggressively erase all such invari- 
ants during compilation. 

Unlike other approaches, the complexity of the resulting representation 
is entirely predictable, we do not require both representations to have 
the same structure, and yet we are able to seamlessly program as if we 
were using the high-level structure. 


Keywords: Quantitative Type Theory - Indexed families - Runtime rep- 
resentation - Idris 2 


1 Introduction 


Dependently typed languages have empowered users to precisely describe their 
domain of discourse by using inductive families [13]. Programmers can bake 
crucial invariants directly into their definitions thus refining both their functions’ 
inputs and outputs. The constrained inputs allow them to only consider the 
relevant cases during pattern matching, while the refined outputs guarantee that 
client code can safely rely on the invariants being maintained. This programming 
style is dubbed ‘correct by construction’. 

However, relying on inductive families can have a non-negligible runtime 
cost if the host language is compiling them naively. And even state of the art 
optimisation passes for dependently typed languages cannot make miracles: if 
the source code is not efficient, the executable will not be either. 

A state of the art compiler will for instance successfully compile length- 
indexed lists to mere lists thus reducing the space complexity from quadratic to 
linear in the size of the list. But, confronted with a list of booleans whose length 
is statically known to be less than 64, it will fail to pack it into a single machine 
word thus spending linear space when constant would have sufficed. 
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In section 2, we will look at an optimisation example that highlights both 
the strengths and the limitations of the current state of the art when it comes to 
removing the runtime overheads potentially incurred by using inductive families. 

In section 3 we will give a quick introduction to Quantitative Type Theory, 
the expressive language that grants programmers the ability to have both strong 
invariants and, reliably, a very efficient runtime representation. 

In section 4 we will look at an inductive family that we use in a performance- 
critical way in the TypOS project [2] and whose compilation suffers from the 
limitations highlighted in section 2. Our current and unsatisfactory approach is 
to rely on the safe and convenient inductive family when experimenting in Agda 
and then replace it with an unsafe but vastly more efficient representation in our 
actual Haskell implementation. 

Finally in section 5, we will study the actual implementation of our efficient 
and invariant-rich solution implemented in Idris 2. We will also demonstrate 
that we can recover almost all the conveniences of programming with inductive 
families thanks to smart constructors and views. 


2 An Optimisation Example 


The prototypical examples of the naive compilation of inductive families being 
inefficient are probably the types of vectors (Vect) and finite numbers (Fin). 
Their interplay is demonstrated by the lookup function. Let us study this exam- 
ple and how successive optimisation passes can, in this instance, get rid of the 
overhead introduced by using indexed families over plain data. 

A vector is a length-indexed list. The type Vect is parameterised by the type 
of values it stores and indexed over a natural number corresponding to its length. 
More concretely, its Nil constructor builds an empty vector of size Z (i.e. zero), 
and its (::) (pronounced ‘cons’) constructor combines a value of type a (the 
head) and a subvector of size n (the tail) to build a vector of size (S n) (i.e. 
successor of n). 


data Vect : Nat -> Type -> Type where 
Nil : Vect Za 
(::) : a -> Vect n a -> Vect (S n) a 


The size n is not explicitly bound in the type of (::). In Idris 2, this means 
that it is automatically generalised over in a prenex manner reminiscent of the 
handling of free type variables in languages in the ML family. This makes it an 
implicit argument of the constructor. Consequently, given that Nat is a type of 
unary natural numbers, a naïve runtime representation of a (Vect n a) would 
have a size quadratic in n. A smarter representation with perfect sharing would 
still represent quite an overhead as observed by Brady, McBride, and McK- 
inna [6]. 

A finite number is a number known to be strictly smaller than a given natural 
number. The type Fin is indexed by said bound. Its Z constructor models ð and 
is bound by any non-zero bound, and its S constructor takes a number bound by 
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n and returns its successor, bound by (1 + n). A naive compilation would here 
also lead to a runtime representation suffering from a quadratic blowup. 


data Fin : Nat -> Type where 
Z : Fin (S n) 
S: Finn -> Fin (S A) 


This leads us to the definition of the lookup function. Provided a vector of 
size n and a finite number k bound by this same n, we can define a total function 
looking up the value stored at position k in the vector. It is guaranteed to return 
a value. Note that we do not need to consider the case of the empty vector in 
the pattern matching clauses as all of the return types of the Fin constructors 
force the index to be non-zero and, because the vector and the finite number talk 
about the same n, having an empty vector would automatically imply having a 
value of type (Fin 0) which is self-evidently impossible. 


lookup : Vect n a -> Fin n -> a 
lookup (x :: _) Z = x 
lookup (_ :: xs) (S k) = lookup xs k 


Thanks to our indexed family, we have gained the ability to define a function 
that cannot possibly fail, as well as the ability to only talk about the pattern 
matching clauses that make sense. This seemed to be at the cost of efficiency but 
luckily for us there has already been extensive work on erasure to automatically 
detect redundant data [6] or data that will not be used at runtime [22]. 


2.1 Optimising Vect, Fin, and lookup 


An analysis in the style of Brady, McBride, and McKinna’s [6] can solve the 
quadratic blowup highlighted above by observing that the natural number a 
vector is indexed by is entirely determined by the spine of the vector. In partic- 
ular, the length of the tail does not need to be stored as part of the constructor: 
it can be reconstructed as the predecessor of the length of the overall vector. 
As a consequence, a vector can be adequately represented at runtime by a pair 
of a natural number and a list. Similarly a bounded number can be adequately 
represented by a pair of natural numbers. Putting all of this together and re- 
membering that the vector and the finite number share the same n, lookup can 
be compiled to a function taking two natural numbers and a list. In Idris 2 
we would write the optimised lookup as follows (we use the partial keyword 
because this transformed version is not total at that type). 


partial 

lookup : (n : Nat) -> List a -> Nat -> a 
lookup (S n) (x :: _) Z = x 

lookup (S n) (_ :: xs) (S k) = lookup n xs k 


We can see in the second clause that the recursive call is performed on the tail 
of the list (formerly vector) and so the first argument to lookup corresponding 
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to the vector’s size is decreased by one. The invariant, despite not being explicit 
anymore, is maintained. 

A Tejiščák-style analysis [22] can additionally notice that the lookup function 
does not use the bound’s value and drop it. This leads to the lookup function 
on vectors being compiled to its partial-looking counterpart acting on lists. 


partial 
lookup : List a -> Nat -> a 
lookup (x :: _) Z = x 


lookup (_ :: xs) (S k) = lookup xs k 


Even though this is in our opinion a pretty compelling example of erasing 
away the apparent complexity introduced by inductive families, this approach 
has two drawbacks. 

Firstly, it relies on the fact that the compiler can and will automatically 
perform these optimisations. But nothing in the type system prevents users 
from inadvertently using a value they thought would get erased, thus preventing 
the Teji8cak-style optimisation from firing. In performance-critical settings, users 
may rather want to state their intent explicitly and be kept to their word by the 
compiler in exchange for predictable and guaranteed optimisations. 

Secondly, this approach is intrinsically limited to transformations that pre- 
serve the type’s overall structure: the runtime data structures are simpler but 
very similar still. We cannot expect much better than that. It is so far unrealistic 
to expect e.g. a change of representation to use a balanced binary tree instead 
of a list in order to get logarithmic lookups rather than linear ones. 


2.2 No Magic Solution 


Even if we are able to obtain a more compact representation of the inductive 
family at runtime through enough erasure, this does not guarantee runtime effi- 
ciency. As the Coq manual [11] reminds its users, extraction does not magically 
optimise away a user-defined quadratic multiplication algorithm when extracting 
unary natural numbers to an efficient machine representation. In a pragmatic 
move, Coq, Agda, and Idris 2 all have ad-hoc rules to replace convenient but inef- 
ficiently implemented numeric functions with asymptotically faster counterparts 
in the target language. 

However this approach is not scalable: if we may be willing to extend our 
trusted core to a high quality library for unbounded integers, we do not want to 
replace our code only proven correct thanks to complex invariants with a wildly 
different untrusted counterpart purely for efficiency reasons. 

In this paper we use Quantitative Type Theory [16,4] as implemented in Idris 
2 [5] to bridge the gap between an invariant-rich but inefficient representation 
based on an inductive family and an unsafe but efficient implementation us- 
ing low-level primitives. Inductive families allow us to view [24,18] the runtime 
relevant information encoded in the low-level and efficient representation as an 
information-rich compile time data structure. Moreover the quantity annotations 
guarantee the erasure of this additional information during compilation. 
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3 Some Key Features of Idris 2 


Idris 2 implements Quantitative Type Theory, a Martin-L6f type theory enriched 
with a semiring of quantities classifying the ways in which values may be used. 
In a type, each binder is annotated with the quantity by which its argument 
must abide. 


3.1 Quantities 


A value may be runtime irrelevant, linear, or unrestricted. 

Runtime irrelevant values (@ quantity) cannot possibly influence control flow 
as they will be erased entirely during compilation. This forces the language 
to impose strong restrictions on pattern-matching over these values. Typical 
examples are types like the a parameter in (List a), or indices like the natural 
number n in (Vect n a). These are guaranteed to be erased at compile time. The 
an argument ought to be runtime irrelevant and the language will insist that it 
needs to be convinced it indeed is. 

Linear values (1 quantity) have to be used exactly once. Typical examples 
include the %World token used by Idris 2 to implement the I0 monad à la Haskell, 
or file handles that cannot be discarded without first explicitly closing the file. 
At runtime these values can be updated destructively. We will not use linearity 
in this paper. 

Last, unrestricted values (denoted by no quantity annotation) can flow into 
any position, be duplicated or thrown away. They are the usual immutable values 
of functional programming. 

The most basic of examples mobilising both the runtime irrelevance and 
unrestricted quantities is the identity function. 


id : {@ a: Type} -> (x : a) -> a 
id x = x 


Its type starts with a binder using curly braces. This means it introduces an 
implicit variable that does not need to be filled in by the user at call sites and 
will be reconstructed by unification. The variable it introduces is named a and 
has type Type. It has the @ quantity annotation which means that this argument 
is runtime irrelevant and so will be erased during compilation. 

The second binder uses parentheses. It introduces an explicit variable whose 
name is x and whose type is the type a that was just bound. It has no quantity 
annotation which means it will be an unrestricted variable. 

Finally the return type is the type a bound earlier. This is, as expected, a 
polymorphic function from a to a. It is implemented using a single clause that 
binds x on the left-hand side and immediately returns it on the right-hand side. 

If we were to try to annotate the binder for x with a @ quantity to make it 
runtime irrelevant then Idris 2 would rightfully reject the definition. The follow- 
ing failing block shows part of the error message complaining that x cannot be 
used at an unrestricted quantity on the right-hand side. 
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failing "x is not accessible in this context.” 
id: {0 a: Type} -> (x: a) -> a 
id x =x 


3.2 Proof Search 


In Idris 2, Haskell-style ad-hoc polymorphism [25] is superseded by a more gen- 
eral proof search mechanism. Instead of having blessed notions of type classes, 
instances and constraints, the domain of any dependent function type can be 
marked as auto. This signals to the compiler that the corresponding argument 
will be an implicit argument and that it should not be reconstructed by uni- 
fication alone but rather by proof search. The search algorithm will use the 
appropriate user-declared hints as well as the local variables in scope. 

By default, a datatype’s constructors are always added to the database of 
hints. And so the following declaration brings into scope both an indexed family 
So of proofs that a given boolean is True, and a unique constructor Oh that is 
automatically added as a hint. 


data So : Bool -> Type where 
Oh : So True 


As a consequence, we can for instance define a record type specifying what 
it means for n to be an even number by storing its half together with a proof 
that is both runtime irrelevant and filled in by proof search. Because (2 * 3 == 
6) computes to True, Idris 2 is able to fill-in the missing proof in the definition 
of even6 using the Oh hint. 


record Even (n : Nat) where even6 : Even 6 
constructor MkEven even6 = MkEven { half = 3 } 
half : Nat 


{auto @ prf : So (2 x half == n)} 


We will use both So and the auto mechanism in section 5.3. 


3.3 Application: Vect, as List 


We can use the features of Quantitative Type Theory to give an implementa- 
tion of Vect that is guaranteed to erase to a List at runtime independently of 
the optimisation passes implemented by the compiler. The advantage over the 
optimisation passes described in section 2 is that the user has control over the 
runtime representation and does not need to rely on these optimisations being 
deployed by the compiler. 

The core idea is to make the slogan ‘a vector is a length-indexed list’ a reality 
by defining a record packing together the encoding as a list and a proof its length 
is equal to the expected Nat index. This proof is marked as runtime irrelevant 
to ensure that the list is the only thing remaining after compilation. 
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record Vect (n : Nat) (a : Type) where 
constructor MkVect 
encoding : List a 
© valid : length encoding === n 


Smart constructors Now that we have defined vectors, we can recover the usual 
building blocks for vectors by defining smart constructors, that is to say func- 
tions Nil and (::) that act as replacements for the inductive family’s data 
constructors. 


Nil : Vect Za 
Nil = MkVect [] Refl 


The smart constructor Nil returns an empty vector. It is, unsurprisingly, 
encoded as the empty list ([]). Because (length []) statically computes to Z, 
the proof that the encoding is valid can be discharged by reflexivity. 


(::) : a -> Vect n a -> Vect (S n) a 
x :: MkVect xs eq = MkVect (x :: xs) (cong S eq) 


Using (::) we can combine a head and a tail of size n to obtain a vector of 
size (S n). The encoding is obtained by consing the head in front of the tail’s 
encoding and the proof this is valid (cong S eq) uses the fact that propositional 
equality is a congruence and that (length (x :: xs)) computes to (S (length 


xs)). 


View Now that we know how to build vectors, we demonstrate that we can also 
take them apart using a view. 

A view for a type T, in the sense of Wadler [24], and as refined by McBride 
and McKinna [18], is an inductive family V indexed by T together with a total 
function mapping every element t of T to a value of type (Vt). This simple 
gadget provides a powerful, user-extensible, generalisation of pattern-matching. 
Patterns are defined inductively as either a pattern variable, a forced term (i.e. 
an arbitrary expression that is determined by a constraint arising from another 
pattern), or a data constructor fully applied to subpatterns. In contrast, the 
return indices of an inductive family’s constructors can be arbitrary expressions. 

In the case that interests us, the view allows us to emulate ‘matching’ on 
which of the two smart constructors Nil or (::) was used to build the vector 
being taken apart. 


data View : Vect n a -> Type where 
Nil : View Nil 
(::) : (x : a) -> (xs : Vect n a) -> View (x :: xs) 


The inductive family View is indexed by a vector and has two constructors 
corresponding to the two smart constructors. We use Idris 2’s overloading capa- 
bilities to give each of the View’s constructors the name of the smart constructor 
it corresponds to. By pattern-matching on a value of type (View xs), we will be 
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able to break xs into its constitutive parts and either observe it is equal to Nil 
or recover its head and its tail. 


view : (xs : Vect n a) -> View xs 
view (MkVect [] Refl) = Nil 
view (MkVect (x :: xs) Refl) = x :: MkVect xs Refl 


The function view demonstrates that we can always tell which constructor 
was used by inspecting the encoding list. If it is empty, the vector was built 
using the Nil smart constructor. If it is not then we got our hands on the head 
and the tail of the encoding and (modulo some re-wrapping of the tail) they are 
effectively the head and the tail that were combined using the smart constructor. 


Application: map We can then use these constructs to implement the function 
map on vectors without ever having to explicitly manipulate the encoding. The 
maximally sugared version of map is as follows: 


map : (a -> b) -> Vect n a -> Vect n b 
map f xs@_ with (view xs) 

-I1 []=[] 

-| hd:: tl =f hd :: map f tl 


On the left-hand side the view lets us seamlessly pattern-match on the input 
vector. Using the with keyword we have locally modified the function definition 
so that it takes an extra argument, here the result of the intermediate compu- 
tation (view xs). Correspondingly, we have two clauses matching on this extra 
argument; the symbol | separates the original left-hand side (here elided using 
_ because it is exactly the same as in the parent clause) from the additional 
pattern. This pattern can either have the shape [] or (hd :: tl) and, corre- 
spondingly, we learn that xs is either [] or (hd :: t1). 

On the right-hand side the smart constructors let us build the output vector. 
Mapping a function over the empty vector yields the empty vector while mapping 
over a cons node yields a cons node whose head and tail have been modified. 

This sugared version of map is equivalent to the following more explicit one: 


map : (a -> b) -> Vect n a -> Vect n b 
map f xs with (view xs) 
map f .(C[]) | [D] = [C] 
map f .(hd :: tl) | hd :: tl = f hd :: map f tl 


In the parent clause we have explicitly bound xs instead of merely introducing 
an alias for it by writing (xs@_) and so we will need to be explicit about the ways 
in which this pattern is refined in the two with-clauses. 

In the with-clauses, we have explicitly repeated the refined version of the 
parent clause’s left-hand side. In particular we have used dotted patterns to 
insist that xs is now entirely forced by the match on the result of (view xs). 

We have seen that by matching on the result of the (view xs) call, we get to 
‘match’ on xs as if Vect were an inductive type. This is the power of views. 
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Application: lookup The type (Fin n) can similarly be represented by a sin- 
gle natural number and a runtime irrelevant proof that it is bound by n. We 
leave these definitions out, and invite the curious reader to either attempt to 
implement them for themselves or look at the accompanying code. 

Bringing these definitions together, we can define a lookup function which is 
similar to the one defined in section 2. 


lookup : Vect n a -> Fin n -> a 

lookup xs@_ k@_ with (view xs) | (view k) 
_ | hd :: _ | Z = hd 

:: tl | S k’ = lookup tl k’ 


We are seemingly using view at two different types (Vect and Fin respec- 
tively) but both occurrences actually refer to separate functions: Idris 2 lets us 
overload functions and performs type-directed disambiguation. 

For pedagogical purposes, this sugared version of lookup can also be ex- 
panded to a more explicit one that demonstrates the views’ power. 


lookup : Vect n a -> Fin n -> a 
lookup xs k with (view xs) | (view k) 
lookup .(hd :: t1) .(Z) | hd :: tl | Z= hd 
lookup .(hd :: tl) .(S k’) | hd :: tl | S k’ = lookup tl k’ 


The main advantage of this definition is that, based on its type alone, we 
know that this function is guaranteed to be processing a list and a single natural 
number at runtime. This efficient runtime representation does not rely on the 
assumption that state of the art optimisation passes will be deployed. 

We have seen some of Idris 2’s powerful features and how they can be lever- 
aged to empower users to control the runtime representation of the inductive 
families they manipulate. This simple example only allowed us to reproduce the 
performance that could already be achieved by compilers deploying state of the 
art optimisation passes. In the following sections, we are going to see how we can 
use the same core ideas to compile an inductive family to a drastically different 
runtime representation while keeping good high-level ergonomics. 


4 Thinnings, Cooked Two Ways 


We experienced a major limitation of compilation of inductive families during 
our ongoing development of TypOS [2], a domain specific language to define 
concurrent typecheckers and elaborators. Core to this project is the definition 
of actors manipulating a generic notion of syntax with binding. Internally the 
terms of this syntax with binding are based on a co-de Bruijn representation (an 
encoding we will explain below) which relies heavily on thinnings. A thinning 
(also known as an Order Preserving Embedding [9]) between a source and a 
target scope is an order preserving injection of the smaller scope into the larger 
one. They are usually represented using an inductive family. The omnipresence of 
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thinnings in the co-de Bruijn representation makes their runtime representation 
a performance critical matter. 

Let us first remind the reader of the structure of abstract syntax trees in a 
named, a de Bruijn, and a co-de Bruijn representation. We will then discuss two 
representations of thinnings: a safe and convenient one as an inductive family, 
and an unsafe but efficient encoding as a pair of arbitrary precision integers. 


4.1 Named, de Bruijn, and co-de Bruijn Syntaxes 


In this section we will use the S combinator (Ag.Af.Ax.gx(fx)) as a running 
example and represent terms using a syntax tree whose constructor nodes are 
circles and variable nodes are squares. To depict the S combinator we will only 
need A-abstraction and application (rendered $) nodes. A constructor’s argu- 
ments become its children in the tree. The tree is laid out left-to-right and a 
constructor’s arguments are displayed top-to-bottom. 


Named Syntax The first representation is using explicit names. Each binder has 
an associated name and each variable node carries a name. A variable refers to 
the closest enclosing binder which happens to be using the same name. 


g 
(8) 


To check whether two terms are structurally equivalent (a-equivalence) po- 
tentially requires renaming bound names. In order to have a simple and cheap 
a-equivalence check we can instead opt for a nameless representation. 


T 


De Bruijn Syntax An abstract syntax tree based on de Bruijn indices [8] re- 
places names with natural numbers counting the number of binders separating a 
variable from its binding site. The S combinator is now written (A Aà 20 (10)). 

You can see in the following graphical depiction that A-abstractions do not 
carry a name anymore and that variables are simply pointing to the binder 
that introduced them. We have left the squares empty but in practice the var- 
ious coloured arrows would be represented by a natural number. For instance 
the dashed magenta one corresponds to 1 because you need to ignore one À- 
abstraction (the orange one) on your way towards the root of the tree before 
you reach the corresponding magenta binder. 
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To check whether a subterm does not mention a given set of variables (a 
thickening test, the opposite of a thinning which extends the current scope with 
unused variables), you need to traverse the whole term. In order to have a simple 
cheap thickening test we can ensure that each subterms knows precisely what 
its support is and how it embeds in its parent’s. 


Co-de Bruijn Syntax In a co-de Bruijn representation [17] each subterm selects 
exactly the variables that stay in scope for that term, and so a variable construc- 
tor ultimately refers to the only variable still in scope by the time it is reached. 
This representation ensures that we know precisely what the scope of a given 
term currently is. 

In the following graphical rendering, we represent thinnings as lists of full 
(e) or empty (0) discs depending on whether the corresponding variable is either 
kept or discarded. For instance the thinning represented by oee throws the blue 
variable away, and keeps both the magenta and orange ones. 


eoo 


oee 


Q — OO 


We can see that in such a representation, each node in the tree stores one 
thinning per subterm. This will not be tractable unless we have an efficient 
representation of thinnings. 


4.2 The Performance Challenges of co-de Bruijn 


Using the co-de Bruijn approach, a term in an arbitrary context is represented 
by the pairing of a term in co-de Bruijn syntax with a thinning from its support 
into the wider scope. Having such a precise handle on each term’s support allows 
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us to make operations such as thinning, substitution, unification, or common 
sub-expression elimination more efficient. 

Thinning a term does not require us to traverse it anymore. Indeed, embed- 
ding a term in a wider context will not change its support and so we can simply 
compose the two thinnings while keeping the term the same. 

Substitution can avoid traversing subterms that will not be changed. Indeed, 
it can now easily detect when the substitution’s domain does not intersect with 
the subterm’s support. 

Unification requires performing thickening tests when we want to solve a 
metavariable declared in a given context with a terms seemingly living in a 
wider one. We once more do not need to traverse the term to perform this test, 
and can simply check whether the outer thinning can be thickened. 

Common sub-expression elimination requires us to identify alpha-equivalent 
terms potentially living in different contexts. Using a de Bruijn representation, 
these can be syntactically different: a variable represented by the natural number 
vin I would be (1+v) in Io but (2+v) in T, rT, v. A co-de Bruijn representation, 
by discarding all the variables not in the support, guarantees that we can once 
more use syntactic equality to detect alpha-equivalence. This encoding is used 
for instance (albeit unknowingly) by Maziarz, Ellis, Lawrence, Fitzgibbon, and 
Peyton-Jones in their ‘Hashing modulo alpha-equivalence’ work [14]. 

For all of these reasons we have, as we mentioned earlier, opted for a co-de 
Bruijn representation in the implementation of TypOS [2]. And so it is crucial 
for performance that we have a compact representation of thinnings. 


Thinnings in TypOS We first carefully worked out the trickier parts of the 
implementation in Agda before porting the resulting code to Haskell. This pro- 
cess highlighted a glaring gap between on the one hand the experiments done 
using a strongly typed inductive representation of thinnings and on the other 
hand their more efficient but unsafe encoding in Haskell. 


Agda The Agda-based experiments use inductive families that make the key 
invariants explicit which helps tracking complex constraints and catches design 
flaws at typechecking time. The indices guarantee that we always transform the 
thinnings appropriately when we add or remove bound variables. In Idris 2, the 
inductive family representation of thinnings would be written: 


data Thinning : (sx, sy : SnocList a) -> Type where 
Done : Thinning [<] [<] 
Keep : Thinning sx sy -> (0 x : a) -> Thinning (sx :< x) (sy :< x) 
Drop : Thinning sx sy -> (@ x : a) -> Thinning sx (sy :< x) 


The Thinning family is indexed by two scopes (represented as snoclists i.e. lists 
that are extended from the right, just like contexts in inference rules): sx the 
tighter scope and sy the wider one. The Done constructor corresponds to a thin- 
ning from the empty scope to itself ([<] is Idris 2 syntactic sugar for the empty 
snoclist), and Keep and Drop respectively extend a given thinning by keeping or 
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dropping the most local variable (:< is the ‘snoc’ constructor, a sort of flipped 
‘cons’). The ‘name’ (x of type a) is marked with the quantity @ to ensure it is 
erased at compile time (cf. section 3). 

During compilation, Idris 2 would erase the families’ indices as they are forced 
(in the sense of Brady, McBride, and McKinna [6]), and drop the constructor 
arguments marked as runtime irrelevant. The resulting inductive type would be 
the following simple data type. 


data Thinning = Done | Keep Thinning | Drop Thinning 


At runtime this representation is therefore essentially a linked list of booleans 
(Done being Nil, and Keep and Drop respectively (True ::) and (False ::)). 


Haskell The Haskell implementation uses this observation and picks a packed 
encoding of this list of booleans as a pair of integers. One integer represents the 
length n of the list, and the other integer’s n least significant bits encode the list 
as a bit pattern where 1 is Keep and ô is Drop. 

Basic operations on thinnings are implemented by explicitly manipulating 
individual bits. It is not indexed and thus all the invariant tracking has to be 
done by hand. This has led to numerous and hard to diagnose bugs. 


Thinnings in Idris 2 Idris 2 is a self-hosting language whose core datatype is 
currently based on a well-scoped de Bruijn representation. This precise indexing 
of terms by their scope helped entirely eliminate a whole class of bugs that 
plagued Idris 1’s unification machinery. 

If we were to switch to a co-de Bruijn representation for our core language 
we would want, and should be able, to have the best of both worlds: a safe and 
efficient representation! 

Thankfully Idris 2 implements Quantitative Type Theory (QTT) which gives 
us a lot of control over what is to be runtime relevant and what is to be erased 
during compilation. This should allow us to insist on having a high-level interface 
that resembles an inductive family while ensuring that everything but a pair of 
integers is erased at compile time. We will exploit the key features of QTT 
presented in section 3 to have our cake and eat it. 


5 An Efficient Invariant-Rich Representation 


We can combine both approaches highlighted in section 4.2 by defining a record 
parameterised by a source (sx) and target (sy) scopes corresponding to the two 
ends of the thinnings, just like we would for the inductive family. This record 
packs two numbers and a runtime irrelevant proof. 

Firstly, we have a natural number called bigEnd corresponding to the size of 
the big end of the thinning (sy). We are happy to use a (unary) natural number 
here because we know that Idris 2 will compile it to an unbounded integer. 

Secondly, we have an integer called encoding corresponding to the thin- 
ning represented as a bit vector stating, for each variable, whether it is kept 
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or dropped. We only care about the integer’s bigEnd least significant bits and 
assume the rest is set to 0. 

Thirdly, we have a runtime irrelevant proof invariant that encoding is indeed 
a valid encoding of size bigEnd of a thinning from sx to sy. We will explore the 
definition of the relation Invariant later on in section 5.3. 


record Th {a : Type} (sx, sy : SnocList a) where 
constructor MkTh 
bigEnd : Nat 
encoding : Integer 
@ invariant : Invariant bigEnd encoding sx sy 


The first sign that this definition is adequate is our ability to construct any 
valid thinning. We demonstrate it is the case by introducing functions that act 
as smart constructor analogues for the inductive family’s data constructors. 


5.1 Smart Constructors for Th 


The first and simplest one is done, a function that packs a pair of @ (the size of the 
big end, and the empty encoding) together with a proof that it is an adequate 
encoding of the thinning from the empty scope to itself. In this instance, the 
proof is simply the Done constructor. 


done : Th [<] [<] 
done = MkTh { bigEnd = @, encoding = 9, invariant = Done } 


To implement both keep and drop, we are going to need to perform bit-level 
manipulations. These are made easy by Idris 2’s Bits interface which provides us 
with functions to shift the bit patterns left or right (shiftl, shiftr), set or clear 
bits at specified positions (setBit, clearBit), take bitwise logical operations like 
disjunction (. |.) or conjunction (.&.), ete. 

In both keep and drop, we need to extend the encoding with an additional 
bit. For this purpose we introduce the cons function which takes a bit b and an 
existing encoding bs and returns the new encoding bs-b. 


cons : Bool -> Integer -> Integer 
cons b bs = let bs®@ = bs ‘shiftL‘ 1 in 
if b then (bs® ‘setBit‘ 2) else bsd 


No matter what the value of the new bit is, we start by shifting the encoding 
to the left to make space for it; this gives us bs@ which contains the bit pattern 
bs-0. If the bit is True then we need to additionally set the bit at position 0 
to obtain bs-1. Otherwise if the bit is False, we can readily return the bs-0 
encoding obtained by left shifting. The correctness of this function is backed by 
two lemma: testing the bit at index 0 after consing amounts to returning the 
cons’d bit, and shifting the cons’d encoding to the right takes us back to the 
unextended encoding. 
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testBit@Cons : (b : Bool) -> (bs : Integer) -> 
testBit (cons b bs) @ === b 


consShiftR : (b : Bool) -> (bs : Integer) -> 
(cons b bs) ‘shiftR‘ 1 === bs 


The keep smart constructor demonstrates that from a thinning from sx to sy 
and a runtime irrelevant variable x we can compute a thinning from the extended 
source scope (sx :< x) to the target scope (sy :< x) where x was kept. 


keep : Th sx sy -> (@ x : a) -> Th (sx :< x) (sy :< x) 
keep th x = MkTh 
{ bigEnd = S (th .bigEnd) 
, encoding = cons True (th .encoding) 
, invariant = 
let @ b = eqToSo $ testBit@Cons True (th .encoding) in 
Keep (rewrite consShiftR True (th .encoding) in th.invariant) x 


The outer scope has grown by one variable and so we increment bigEnd. The 
encoding is obtained by cons-ing the boolean True to record the fact that this 
new variable is kept. Finally, we use the two lemmas shown above to convince 
Idris 2 the invariant has been maintained. 


Similarly the drop function demonstrates that we can compute a thinning 
getting rid of the variable x freshly added to the target scope. 


drop : Th sx sy -> (@ x : a) -> Th sx (sy :< x) 
drop th x = MkTh 
{ bigEnd = S (th .bigEnd) 
, encoding = cons False (th .encoding) 
, invariant = 
let ð prf = testBit@Cons False (th .encoding) 
@ nb = eqToSo $ cong not prf in 
Drop (rewrite consShiftR False (th .encoding) in th .invariant) x 


We once again increment the bigEnd, use cons to record that the variable is 
being discarded and use the lemmas ensuring its correctness to convince Idris 2 
the invariant is maintained. 


We can already deploy these smart constructors to implement functions pro- 
ducing thinnings. We use which as our example. It is a filter-like function that 
returns a dependent pair containing the elements that satisfy a boolean predi- 
cate together with a proof that there is a thinning embedding them back into 
the input snoclist. 
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which : (a -> Bool) -> (sy : SnocList a) -> 
(sx : SnocList a ** Th sx sy) 
which p [<] = ([<] ** done) 
which p (sy :< y) = 
let (sx ** th) = which p sy in 
if p y then (sx :< y ** keep th y) 
else (sx ** drop th y) 


If the input snoclist is empty then the output shall also be, and done builds 
a thinning from [<] to itself. If it is not empty we can perform a recursive call 
on the tail of the snoclist and then depending on whether the predicates holds 
true of the head we can either keep or drop it. 


We are now equipped with these smart constructors that allow us to seam- 
lessly build thinnings. To recover the full expressive power of the inductive family, 
we also need to be able to take these thinnings apart. Let us now tackle this issue. 


5.2 Pattern Matching on Th 


The View family is a sum type indexed by a thinning. It has one data constructor 
associated to each smart constructor and storing its arguments. 


data View : Th sx sy -> Type where 
Done : View done 
Keep : (th : Th sx sy) -> (0 x : a) -> View (keep th x) 
Drop : (th : Th sx sy) -> (0 x : a) -> View (drop th x) 


The accompanying view function witnesses the fact that any thinning arises 
as one of these three cases. 


view : (th : Th sx sy) -> View th 


We show the implementation of view in its entirety but leave out the tech- 
nical auxiliary lemma it invokes. The interested reader can find them in the 
accompanying material. We will however inspect the code view compiles to after 
erasure in section 5.5 to confirm that these auxiliary definitions do not incur any 
additional runtime cost. 


We first start by pattern matching on the bigEnd of the thinning. If it is 0 
then we know the thinning has to be the empty thinning. Thanks to an inversion 
lemma called isDone, we can collect a lot of equality proofs: the encoding bs has 
to be 2, the source and target scopes sx and sy have to be the empty snoclists, 
and the proof prf of the invariant has to be of a specific shape. Rewriting by 
these equalities changes the goal type enough for the typechecker to ultimately 
see that the thinning was constructed using the done smart constructor and so 
we can use the view’s Done constructor. 
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view (MkTh ð bs prf) = 
let @ eqs = isDone prf in 
rewrite bsIsZero eqs in 
rewrite fstIndexIsLin eqs in 
rewrite sndIndexIsLin eqs in 
rewrite invariantIsDone eqs in 
Done 


In case the thinning is non-empty, we need to inspect the 0-th bit of the 
encoding to know whether it keeps or discards its most local variable. This is 
done by calling the choose function which takes a boolean b and returns a value 
of type (Either (So b) (So (not b)) i.e. we not only inspect the boolean but also 
record which value we got in a proof using the So family introduced in section 3. 


view (MkTh (S i) bs prf) = case choose (testBit bs Z) of 


If the bit is set then we know the variable is kept. And so we can invoke an 
inversion lemma that will once again provide us with a lot of equalities that we 
immediately deploy to reshape the goal’s type. This ultimately lets us assemble 
a sub-thinning and use the view’s Keep constructor. 


Left so => 

let @ eqs = isKeep prf so in 

rewrite fstIndexIsSnoc eqs in 

rewrite sndIndexIsSnoc eqs in 

rewrite invariantIsKeep eqs in 

rewrite isKeepInteger bs so in 

let th : Th eqs. fstIndexTail eqs.sndIndexTail 

th = MkTh i (bs ‘shiftR‘ 1) eqs.subInvariant in 

cast $ Keep th eqs.keptHead 


If the bit is not set then we learn that the thinning was constructed using 
drop. We can once again use an inversion lemma to rearrange the goal and finally 
invoke the view’s Drop constructor. 


Right soNot => 
let @ eqs = isDrop prf soNot in 
rewrite sndIndexIsSnoc eqs in 
rewrite invariantIsDrop eqs in 
rewrite isDropInteger bs soNot in 
let th : Th sx eqs.sndIndexTail 
th = MkTh i (bs ‘shiftR‘ 1) eqs.subInvariant in 
cast $ Drop th eqs.keptHead 


We can readily use this function to implement pattern matching functions 
taking a thinning apart. We can for instance define kept, the function that 
counts the number of keep smart constructors used when manufacturing the 
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input thinning and returns a proof that this is exactly the length of the source 
scope Sx. 


kept : Th sx sy -> (n : Nat ** length sx === n) 
kept th = case view th of 
Done => (0 xx Refl) 


Keep th x => let (n ** eq) = kept th in 
(S n ** cong S eq) 
Drop th x => kept th 


We proceed by calling the view function on the input thinning which imme- 
diately tells us that we only have three cases to consider. The Done case is easily 
handled because the branch’s refined types inform us that both sx and sy are the 
empty snoclist [<] whose length is evidently 2. In the Keep branch we learn that 
sx has the shape (_ :< x) and so we must return the successor of whatever the 
result of the recursive call gives us. Finally in the Drop case, sx is untouched and 
so a simple recursive call suffices. Note that the function is correctly detected as 
total because the target scope sy is indeed getting structurally smaller at every 
single recursive call. It is runtime irrelevant but it can still be successfully used 
as a termination measure by the compiler. 


5.3 The Invariant Relation 


We have shown the user-facing Th and have claimed that it is possible to define 
smart constructors done, keep, and drop, as well as a view function. This should 
become apparent once we show the actual definition of Invariant. 


Definition of Invariant The relation maintains the invariant between the 
record’s fields bigEnd (a Nat) and encoding (an Integer) and the index scopes 
sx and sy. Its definition can favour ease-of-use of runtime efficiency because we 
statically know that all of the Invariant proofs will be erased during compilation. 


data Invariant : (i : Nat) -> (bs : Integer) -> 
(sx, sy : SnocList a) -> Type where 

Done : Invariant Z @ [<] [<] 

Keep : Invariant i (bs ‘shiftR‘ 1) sx sy -> (® x : a) -> 
{auto @b : So (testBit bs Z)} -> 
Invariant (S i) bs (sx :< x) (sy :< x) 

Drop : Invariant i (bs ‘shiftR‘ 1) sx sy -> (@ x : a) -> 
{auto @ nb : So (not (testBit bs Z))} -> 
Invariant (S i) bs sx (sy :< x) 


As always, the Done constructor is the simplest. It states that the thinning 
of size Z and encoded as the bit pattern ô is the empty thinning. 

The Keep constructor guarantees that the thinning of size (S i) and encoding 
bs represents an injection from (sx :< x) to (sy :< x) provided that the bit at 
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position Z of bs is set, and that the rest of the bit pattern (obtained by a right 
shift on bs) is a valid thinning of size i from sx to sy. 


The Drop constructor is structured the same way, except that it insists the 
bit at position Z should not be set. 


We can readily use this relation to prove that some basic encoding are valid 
representations of useful thinnings. 


Examples of Invariant proofs For instance, we can always define a thinning 
from the empty scope to an arbitrary scope sy. 


none : (sy : SnocList a) -> Th [<] sy 
none sy = MkTh (length sy) @ (none sy) 


The encoding of this thinning is 2 because every variable is being discarded 
and its bigEnd is the length of the outer scope sy. The validity proof is provided 
by the none lemma proven below. We once again use Idris 2’s overloading to give 
the same to functions that play similar roles but at different types. 


none : (sy : SnocList a) -> Invariant (length sy) @ [<] sy 
none [<] = Done 
none (sy :< y) = Drop (none sy) y 


The proof proceeds by induction over the outer scope sy. If it is empty, we 
can simply use the constructor for the empty thinning. Otherwise we can invoke 
Drop on the induction hypothesis. This all typechecks because (testBit ð Z) 
computes to False and so the nb proof can be constructed automatically by 
Idris 2’s proof search (cf. section 3.2), and (ð ‘shiftR‘ 1) evaluates to ð which 
means the induction hypothesis has exactly the right type. 

The definition of the identity thinning is a bit more involved. For a scope of 
size n, we are going to need to generate a bit pattern consisting of n ones. We 
define it in two steps. First, cofull defines a bit pattern of k zeros followed by 
infinitely many ones by shifting k places to the left a bit pattern of ones only. 
Then, we obtain full by taking the complement of cofull. 


cofull : Nat -> Integer full : Nat -> Integer 
cofull n = oneBits ‘shiftL‘ n full n = complement (cofull n) 


We can then define the identity thinning for a scope of size n by pairing (full 
n) as the encoding and n as the bigEnd. 


ones : (sx : SnocList a) -> Th sx sx 
ones sx = let n : Nat; n = length sx in MkTh n (full n) (ones sx) 


The bulk of the work is once again in the eponymous lemma proving that 
this encoding is valid. 
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ones : (sx : SnocList a) -> 
let n = length sx in Invariant n (full n) sx sx 
ones [<] = Done 
ones (sx :< x) = 
let @ nb = eqToSo (testBitFull (S (length sx)) Z) in 
Keep (rewrite shiftRFull (length sx) in ones sx) x 


This proof proceeds once more by induction on the scope. If the scope is 
empty then once again the constructor for the empty thinning will do. In the non- 
empty case, we first appeal to an auxiliary lemma (not shown here) to construct 
a proof nb that the bit at position Z for a non-zero full integer is known to 
be True. We then need to use another lemma to cast the induction hypothesis 
which mentions (full (length sx)) so that it may be used in a position where 
we expect a proof talking about (full (length (sx :< x)) ‘shiftR* 1). 


Properties of the Invariant relation This relation has a lot of convenient 
properties. 

First, it is proof irrelevant: any two proofs that the same i, bs, sx, and sy 
are related are provably equal. Consequently, equality on Th values amounts to 
equality of the bigEnd and encoding values. In particular it is cheap to test 
whether a given thinning is the empty or the identity thinning. 

Second, it can be inverted [12] knowing only two bits: whether the natural 
number is empty and what the value of the bit at position Z of the encoding is. 
This is what allowed us to efficiently implement the view function by using these 
two checks and then inverting the Invariant proof to gain access to the proof 
that the remainder of the thinning’s encoding is valid. We will see in section 5.5 
that this leads to efficient runtime code for the view. 


5.4 Choose Your Own Abstraction Level 


Access to both the high-level View and the internal Invariant relation means 
that programmers can pick the level of abstraction at which they want to work. 
They may need to explicitly manipulate bits to implement key operators that 
are used in performance-critical paths but can also stay at the highest level for 
more negligible operations, or when proving runtime irrelevant properties. 

In the previous section we saw simple examples of these bit manipulations 
when defining none (using the constant 0 bit pattern) and ones using bit shifting 
and complement to form an initial segment of 1s followed by Os. 

Other natural examples include the meet and join of two thinnings sharing 
the same wider scope. The join can for instance be thought of either as a function 
defined by induction on the first thinning and case analysis on the second, emit- 
ting a Keep constructor whenever either of the inputs does. Or we can observe 
that the bit pattern in the join is the disjunction of the inputs’ bit patterns and 
prove a lemma about the Invariant relation instead. This can be visualised as 
follows: in each column the meet is a e whenever either of the inputs is. 
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The join is of particular importance because it appears when we convert 
an ‘opened’ view of a term into its co-de Bruijn counterpart. As we mentioned 
earlier, co-de Bruijn terms in an arbitrary scope are represented by the pairing 
of a term indexed by its precise support with a thinning embedding this sup- 
port back into the wider scope. When working with such a representation, it 
is convenient to have access to an ‘opened’ view where the outer thinning has 
been pushed inside therefore exposing the term’s top-level constructor, ready for 
case-analysis. 

The following diagram shows the correspondence between an ‘opened’ ap- 
plication node using the view (the diamond ‘$’ node) with two subterms both 
living in the outer scope and its co-de Bruijn form (the circular ‘$’ node) with 
an outer thinning selecting the term support. 
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The outer thinning of the co-de Bruijn term is obtained precisely by com- 
puting the join of the respective outer thinnings of the ‘opened’ application’s 
function and argument. 
These explicit bit manipulations will be preserved during compilation and 


thus deliver more efficient code. 


5.5 Compiled Code 


The following code block shows the JavaScript code that is produced when com- 
piling the view function. We chose to use the JavaScript backend rather than 
e.g. the ChezScheme one because it produces fairly readable code. We have mod- 
ified the backend to also write comments reminding the reader of the type of 
the function being defined and the data constructors the natural number tags 
correspond to. These changes are now available to all in Idris 2 version 0.6.0. 

The only manual modifications we have performed are the inlining of a func- 
tion corresponding to a case block, renaming variables and property names to 
make them human-readable, introducing the $tail definitions to make lines 
shorter, and slightly changing the layout. 


/x Thin.Smart.view : (th : Th sx sy) -> View th */ 
function Thin_Smart_view($th) { 
switch($th.bigEnd) { 
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case Qn: return {h: @ /* Done x/}; 
default: { 
const $predBE = ($th.bigEnd-1n); 
const $test = choose(notEq(($th.encoding&1n), @n))); 
switch($test.tag) { 
case 0: /x Left */ { 
const $tail = $th.encoding>>1n; 
return { tag: 1 /* Keep x/ 
, val: {bigEnd: $predBE, encoding: $tail}}; } 
case 1: /x Right */ { 
const $tail = $th.encoding>>1n; 
return { tag: 2 /* Drop */ 
, val: {bigEnd: $predBE, encoding: $tail}}; } 
333} 


Readers can see that the compilation process has erased all of the indices and 
the proofs showing that the invariant tying the efficient runtime representation 
to the high-level specification is maintained. A thinning is represented at run- 
time by a JavaScript object with two properties corresponding to Th’s runtime 
relevant fields: bigEnd and encoding. Both are storing a JavaScript bigInt (one 
corresponding to the Nat, the other to the Integer). For instance the thinning 
[01101] would be at runtime { bigEnd: 5n, encoding: 13n }. 

The view proceeds in two steps. First if the bigEnd is @n then we know the 
thinning is empty and can immediately return the Done constructor. Otherwise 
we know the thinning to be non-empty and so we can compute the big end of its 
tail ($predBE) by subtracting one to the non-zero bigEnd. We can then inspect 
the bit at position @ to decide whether to return a Keep or a Drop constructor. 
This is performed by using a bit mask to 0-out all the other bits ($th.bigEnd&1n) 
and checking whether the result is zero. If it is not equal to 0 then we emit Keep 
and compute the $tail of the thinning by shifting the original encoding to drop 
the Oth bit. Otherwise we emit Drop and compute the same tail. 

By running view on this [01101] thinning, we would get back (Keep [0110]), 
that is to say { tag: 1, val: { bigEnd: 4n, encoding: 6n } }. 

Thanks to Idris 2’s implementation of Quantitative Type Theory we have 
managed to manufacture a high level representation that can be manipulated 
like a classic inductive family using smart constructors and views without giving 
up an inch of control on its runtime representation. 

The remaining issues such as the fact that we form the view’s constructors 
only to immediately take them apart thus creating needless allocations can be 
tackled by reusing Wadler’s analysis (section 12 of [24]). 


6 Conclusion 


We have seen that inductive families provide programmers with ways to root out 
bugs by enforcing strong invariants. Unfortunately these families can get in the 


Builtin Types Viewed as Inductive Families 135 


way of producing performant code despite existing optimisation passes erasing 
redundant or runtime irrelevant data. This tension has led us to take advantage 
of Quantitative Type Theory in order to design a library combining the best of 
both worlds: the strong invariants and ease of use of inductive families together 
with the runtime performance of explicit bit manipulations. 


6.1 Related Work 


For historical and ergonomic reasons, idiomatic code in Coq tends to center 
programs written in a subset of the language quite close to OCaml and then 
prove properties about these programs in the runtime irrelevant Prop fragment. 
This can lead to awkward encodings when the unrefined inputs force the user to 
consider cases which ought to be impossible. Common coping strategies involve 
relaxing the types to insert a modicum of partiality e.g. returning an option type 
or taking an additional input to be used as the default return value. This ap- 
proach completely misses the point of type-driven development. We benefit from 
having as much information as possible available during interactive editing. This 
information not only helps tremendously getting the definitions right by ensuring 
we always maintain vital invariants thus making invalid states unrepresentable, it 
also gives programmers access to type-driven tools and automation. Thankfully 
libraries such as Equations [20,21] can help users write more dependently typed 
programs, by taking care of the complex encoding required in Coq. A view-based 
approach similar to ours but using Prop instead of the zero quantity ought to be 
possible. We expect that the views encoded this way in Coq will have an even 
worse computational behaviour given that Equations uses a sophisticated elab- 
oration process to encode dependent pattern-matching into Gallina. However 
Coq does benefit from good automation support for unfolding lemmas, inversion 
principles, and rewriting by equalities. It may compensate for the awkwardness 
introduced by the encoding. 

Prior work on erasure [22] has the advantage of offering a fully automated 
analysis of the code. The main inconvenience is that users cannot state explicitly 
that a piece of data ought to be runtime irrelevant and so they may end up 
inadvertently using it which would prevent its erasure. Quantitative Type Theory 
allows us users to explicitly choose what is and is not runtime relevant, with 
the quantity checker keeping us true to our word. This should ensure that the 
resulting program has a much more predictable complexity. 

A somewhat related idea was explored by Brady, McKinna, and Hammond in 
the context of circuit design [7]. In their verification work they index an efficient 
representation (natural numbers as a list of bits) by its meaning as a unary 
natural number. All the operations are correct by construction as witnessed by 
the use of their unary counterparts acting as type-level specifications. In the end 
their algorithms still process the inductive family instead of working directly 
with binary numbers. This makes sense in their setting where they construct 
circuits and so are explicitly manipulating wires carrying bits. By contrast, in our 
motivating example we really want to get down to actual (unbounded) integers 
rather than linked lists of bits. 
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6.2 Limitations and Future Work 


Overall we found this case study using Idris 2, a state of the art language based 
on Quantitative Type Theory, very encouraging. The language implementation 
is still experimental but none of the issues are intrinsic limitations. We hope to 
be able to push this line of work further, tackling the following limitations and 
exploring more advanced use cases. 


Limitations Unfortunately it is only propositionally true that (view (keep th 
x)) computes to (Keep th x) (and similarly for done/Done and drop/Drop). This 
means that users may need to manually deploy these lemmas when proving the 
properties of functions defined by pattern matching on the result of calling the 
view function. This annoyance would disappear if we had the ability to extend 
Idris 2’s reduction rules with user-proven equations as implemented in Agda and 
formally studied by Cockx, Tabareau, and Winterhalter [10]. 

In this paper’s case study, we were able to design the core Invariant relation 
making the invariants explicit in such a way that it would be provably proof 
irrelevant. This may not always be possible given the type theory currently 
implemented by Idris 2. Adding support for a proof-irrelevant sort of propositions 
(see e.g. Altenkirch, McBride, and Swierstra’s work [3]) could solve this issue 
once and for all. 

The Idris 2 standard library thankfully gave us access to a polished pure 
interface to explicitly manipulate an integer’s bits. However these built-in oper- 
ations came with no built-in properties whatsoever. And so we had to postulate 
a (minimal) set of axioms and prove a lot of useful corollaries ourselves. There is 
even less support for other low-level operations such as reading from a read-only 
array, or manipulating pointers. 

We also found the use of runtime irrelevance (the @ quantity) sometimes 
frustrating. Pattern-matching on a runtime irrelevant value in a runtime relevant 
context is currently only possible if it is manifest for the compiler that the value 
could only arise using one of the family’s constructors. In non-trivial cases this 
is unfortunately only merely provable rather than self-evident. Consequently we 
are forced to jump through hoops to appease the quantity checker, and end up 
defining complex inversion lemmas to bypass these limitations. This could be 
solved by a mix of improvements to the typechecker and meta-programming 
using prior ideas on automating inversion [12,15,19]. 


Future work We are planning to explore more memory-mapped representations 
equipped with a high level interface. 

We already have experimental results demonstrating that we can use a read- 
only array as a runtime representation of a binary search tree. Search can be 
implemented as a proven-correct high level decision procedure that is seemingly 
recursively exploring the "tree". At runtime however, this will effectively execute 
like a classic search by dichotomy over the array. 

More generally, we expect that a lot of the work on programming on serialised 
data done in LoCal [23] thanks to specific support from the compiler can be 
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done as-is in a QTT-based programming language. Indeed, QTT’s type system 
is powerful enough that tracking these invariants can be done purely in library 
code. 

In the short term, we would like to design a small embedded domain specific 
language giving users the ability to more easily build and take apart products 
and sums efficiently represented in the style we presented here. Staging would 
help here to ensure that the use of the eDSL comes at no runtime cost. There 
are plans to add type-enforced staging to Idris 2, thus really making it the ideal 
host language for our project. 

Our long term plan is to go beyond read-only data and look at imperative 
programs proven correct using separation logic and see how much of this after- 
the-facts reasoning can be brought back into the types to enable a high-level 
correct-by-construction programming style that behaves the same at runtime. 
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Abstract. Gradualizing System F has been widely discussed. A big challenge is 
to preserve relational parametricity and/or the gradual guarantee. Most past work 
has focused on the preservation of parametricity, but often without the gradual 
guarantee. A few recent works satisfy both properties by giving up System F syn- 
tax, or with some restrictions and the introduction of sophisticated mechanisms 
in the dynamic semantics. 

While parametricity is important for polymorphic languages, most mainstream 
languages typically do not satisfy it, for a variety of different reasons. In this 
paper, we explore the design space of polymorphic languages that satisfy the 
gradual guarantee, but do not preserve parametricity. When parametricity is not 
a goal, the design of polymorphic gradual languages can be considerably simpli- 
fied. Moreover, it becomes easy to add features that are of practical importance, 
such as mutable references. We present a new gradually typed polymorphic cal- 
culus, called Ad with mutable references and with an easy proof of the gradual 
guarantee. In addition, compared to other gradual polymorphism work, Ae is 
defined using a Type-Directed Operational Semantics (TDOS), which allows the 
dynamic semantics to be defined directly instead of elaborating to a target cast 
language. Ae and all the proofs in this paper are formalized in Coq. 


Keywords: Gradual Typing - Type System - Polymorphism. 


1 Introduction 


Statically typed languages can statically detect potential errors in programs, but must 
necessarily be conservative and reject some well-behaved programs. With dynamically 
typed languages, all programs are accepted, which offers a great amount of flexibility. 
However, the accepted dynamic programs include programs with type errors, making 
it harder to detect programs that are ill-behaved because of type errors. Considering 
the weaknesses and advantages of static and dynamic type systems, many approaches 
have proposed to integrate these two spectrums [1,7,35,22,8]. Gradual typing [31,35] 
provides a smooth integration of the two styles and has been under active research in 
the programming languages community. In addition to the type soundness property, a 
gradual language should behave as a static language if it is fully annotated. Conversely, 
it should behave as a dynamic language for fully dynamic programs. Importantly, the 
gradual guarantee [32] has been proposed to ensure a smooth transition between static 
and dynamic typing. 

The importance of System F as a foundation for programming languages with poly- 
morphism naturally leads to the question of whether it is possible to gradualize it. Vari- 
ous researchers have explored this question. In this line of research, a long-standing goal 
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has been how to preserve relational parametricity [28]. Parametricity ensures a uniform 
behavior for all instantiations of polymorphic functions, and is an important property of 
System F. In addition it is also desirable to preserve the gradual guarantee [32], which 
is recognized as an important property for gradual languages. Unlike System F, where 
no dynamic mechanism is needed to ensure parametricity, with gradualized versions of 
System F this is no longer the case. Ahmed et al. [3] showed that parametricity can be 
enforced using a dynamic sealing mechanism at runtime. They prove parametricity, but 
the gradual guarantee is not discussed. Igarashi et al. [17] improved on the dynamic 
sealing approach and proposed a more efficient mechanism. While the gradual guar- 
antee has been discussed, it was left as a conjecture. Toro et al. [37] even proved that 
gradual guarantee and parametricity are incompatible. By giving up the traditional Sys- 
tem F syntax, New et al. [24] proved the gradual guarantee and parametricity by using 
user-provided sealing annotations, but this requires resorting to syntax that is not based 
on System F. Finally, Labrada et al. [20] proved the gradual guarantee and parametricity 
by inserting sealing with some restrictions. For instance, only base and variable types 
can be used to instantiate type applications. 


While parametricity is highly valued and it is guaranteed in practice in some func- 
tional languages, many mainstream programming languages — such as Java, TypeScript 
or Flow — do not have parametricity. In mainstream languages the value of paramet- 
ric polymorphism, and its ability to express a whole family of functions in a reusable 
and type-safe manner is certainly recognized. However, such languages are imperative 
and come with a variety of programming language features (such as unrestricted forms 
of mutable state, exceptions, parallelism and concurrency mechanisms, reflection, etc.) 
that make it hard to apply reasoning principles known in functional programming. In 
particular, most of those features are known to be highly challenging to deal with in 
the presence of parametricity [2,18,23]. This makes it non-obvious how to design a lan- 
guage with all those features, while preserving parametricity, in the first place. More- 
over, preserving parametricity may require extra dynamic checks at runtime, which for 
implementations where performance is a critical factor may discourage implementers 
from doing such checks. Therefore all the aforementioned programming languages sup- 
port System F like mechanisms to deal with polymorphism and benefit from the reuse 
afforded by polymorphism. However, the reasoning principles that arise from polymor- 
phism, such as parametricity is discarded, and parametricity is not enforced. 


In particular, programming languages such as TypeScript or Flow, which support 
some form of gradual/optional typing, and are widely used in practice, do not support 
parametricity. Figure 1 encodes an example from Ahmed et al.’s work [3], which was 
used to illustrate the parametricity challenge in gradual typing, in TypeScript and Flow. 
In this program, the polymorphic function Ks has a polymorphic type: (X —> Y > Y), 
where X and Y are type variables. In a calculus with parametricity, we know that a 
function with such type should always return the second argument or, in the presence 
of runtime casts, return an error. In the program, Ks is as a function that casts a dynamic 
constant function (K) that returns the first argument, which violates parametricity. When 
the TypeScript and Flow programs are run the first argument 2 is returned, illustrating 
that both languages do not enforce parametricity. In a gradual language with parametric- 
ity the result that we would expect is an error. Furthermore, even if we turn to Typed 
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function K(x:any, y:any): any { function K(x:any, y:any): any { 
return x; return x; 
} } 
function Ks<X, Y>(x: X, y: Y): Y{ function Ks<X, Y>(x: X, y: Y): Y{ 
let CAST = (K as any) as ((x: let CAST = ((K : any) : (C(x: 
X, y: Y) > Y); X, y: Y) => D); 
return CAST(x, y); return CAST(x, y); 
} } 
function run() { function run() { 
console .log(Ks<number, console.log(Ks (2,3)); 
number> (2, 3)); 
} } 
(a) TypeScript code. (b) Flow code. 


Fig. 1: Ahmed et al. [3] program for illustrating parametricity in TypeScript and Flow. 


Racket [36], which is a well-established gradual language used in both gradual typing 
research and in practice, the result is similar and 2 is returned: 


C: K Any) 
(define K ( 4 (x) (4 y) x))) 
(define Ks 
(cast K (All (X Y) (> X (> YDD) 
CAs 2) 3) 


Therefore Typed Racket does not enforce parametricity either. 

In this paper, we explore the more pragmatic design space of polymorphic gradual 
languages with the gradual guarantee, but no parametricity. We believe that such de- 
signs are relevant because many practical language designs do not support parametric- 
ity, but support various other programming features instead. Dropping the requirement 
for parametricity enables us to explore language designs with many relevant practi- 
cal features, while being in line with current designs for existing practical gradually 
typed languages. In particular, this paper studies the combination of parametric poly- 
morphism, gradual typing and references. We show that, when parametricity is not a 
goal, the design of gradually polymorphic languages can be simplified, making it easier 
to add features such as references. Moreover, the gradual guarantee, which has shown 
to be quite problematic in all existing calculi with gradual polymorphism, is simple to 
achieve. We present a standard static calculus with polymorphism and mutable refer- 
ences called 4gp. Then we introduce the gradual counterpart, called Mes 

The approach that we follow to give the dynamic semantics to Ay is to use the re- 
cently proposed Type-Directed Operational Semantics TDOS [16,42]. In contrast, tra- 
ditionally the semantics of a gradually typed language is defined by elaboration to a 
target cast calculus such as the blame calculus [39]. In other words, the dynamic se- 
mantics of the gradual source language is given indirectly by translating to the target 
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language. As Ye et al. [42] shows, TDOS avoids such indirection and uses bidirectional 
typing and type annotations to enforce both implicit and explicit casting at runtime in 
gradually typed languages. 

In summary, we make the following contributions in this paper: 


— The Aas calculus: A gradual calculus with polymorphism and mutable references. 


Ane calculus is the gradual counterpart of the Apr calculus. Both Azai and Agpr are 
shown to be type sound and deterministic. 

- Gradual guarantee for 4,,„. We prove the gradual guarantee for A,„. The proof 
is easy and quite simple, in contrast to previous work in gradual polymorphism, 
where the gradual guarantee was a major obstacle. 

— A TDOS extension. TDOS has been applied to gradual typing before [42]. How- 
ever, the previous work on TDOS for gradual typing only works in a purely func- 
tional, simply typed calculus. Our work shows that the TDOS approach can incor- 
porate other features, including polymorphism and references. 

— A mechanical formalization in the Coq theorem prover. All the calculi and 
proofs in this paper have been mechanically formalized in the Coq theorem prover. 


The Coq formalization can be found in the supplementary materials of this paper: 


https: //www.zenodo.org/badge/latestdoi/581421930 


2 Overview 


This section provides a background for gradual polymorphic calculi, calculi with grad- 
ual references and the key ideas of our static system (Apr) with polymorphism and 


references and its gradual counterpart AaS). 


2.1 Background 


Gradual References. Mutable references read or write content into a memory cell. 
A common set of operations is: allocating a memory cell (ref e); updating references 
(e1 := e2) and reading the content from a reference (!e). Locations (0) point to the 
memory cell. For a reference value ref 1, a new location (o) is generated and value 1 
is stored in the cell at the location o. If 2 is assigned to this location o := 2, the cell 
value is updated to 2. Later, when we read this cell (!o0), 2 is returned. Siek et al. [31] 
defined an invariant consistency relation for reference types. Reference types are only 
consistent with themselves. For example: 


(Ax. (x := 2) : Ref x — Ref x) (ref 1) — Rejected! Ref Int + Ref x 


Although the type Int is consistent with *, it does not mean that Ref Int is consistent 
with Ref x. Therefore, the argument type is not consistent with the function input, and 
the program is rejected. Herman et al. [14] proposed a gradually typed lambda source 
language with references, which defines the dynamic semantics by elaborating to a co- 
ercion calculus. The above program is allowed in their calculus. They define variant 
consistency where if A is consistent with B then Ref A is consistent with Ref B. In their 
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calculus, casts are combined to achieve space-efficiency. Furthermore, Siek et al. [33] 
explored monotonic references with variant consistency. Their main consideration is 
space efficiency. No runtime overhead is imposed in the statically typed part of pro- 
grams. All the above works have not considered the gradual guarantee. 

Toro and Tanter [38] showed how to employ the Abstracting Gradual Typing (AGT) 
[12] methodology to design a gradually typed calculus with mutable references (A gz). 
Their dynamic semantics of the source language is defined by translating to an evidence 
base calculus. They prove a bisimulation with the coercion calculus by Herman etal. [14]. 
Agur 1S proved to satisfy the gradual guarantee. The consistency of Ag; is also variant. 


Gradual Polymorphism. Gradual polymorphism is a popular topic. Researchers have 
been working in this area for a long time. Prior work has focused on two key properties: 
relational parametricity [28] and the gradual guarantee [32]. Relational parametricity 
ensures that all instantiations to a polymorphic value behave uniformly. The gradual 
guarantee ensures that less dynamic programs behave the same as more static programs. 
Satisfying these two properties at once has shown to be problematic. Ahmed et 
al. [3] showed that a naive combination of the unknown type x and type substitution 
breaks relational parametricity. They show the problem using a simple expression with 
two casts. To simplify the presentation, we ignore blame labels. Suppose that K* = 
[Ax.Ay.x], the dynamically typed constant function, is cast to a polymorphic type: 


K* : x > YX.VY.X >Y >X K* : x > YX.YY.X >Y >Y 


The notation e : A => B, borrowed from the blame calculus [29], means cast expres- 
sion e from type A to type B. The constant function K* returns the first argument. 
Considering relational parametricity, a value of type YX.YY.X — Y — X should 
be a constant value which always returns the first argument. While a value of type 
VX.VY.X — Y — Y should return the second argument. Therefore, the first cast suc- 
ceeds and the second cast should fail. However, if these two casts are applied to the 
arguments in the usual way employing type substitutions, then we obtain the following: 


(K* : x > VX.VY.X — Y > X)IntInt23 
<>* (K* : x = Int > Int > Int) 
<>* 2 

(K* : x => YX.VY.X > Y > Y)IntInt23 
<>* (K* : x > Int > Int > Int) 


>* 2 


The second cast succeeds and returns the first argument, which breaks parametricity. 
The reason for this behavior is that, after the type substitution, the polymorphic in- 
formation is lost. Note that, as we have seen in Section 1, this is exactly how various 
practical languages (TypeScript, Flow and Typed Racket) behave. 

Much of the work on gradual polymorphism aims at addressing the above prob- 
lem. That is, for the second cast we would like to obtain blame instead of 2, so that 
parametricity is not violated. While the preservation of parametricity is a worthy goal, 
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it typically requires substantial changes to a calculus to ensure its preservation, since 
naive direct type substitutions do not work. Furthermore, this also affects proofs, which 
can become significantly more complicated due to the changes in the calculus. To ad- 
dress this problem a well-known approach, originally proposed by Ahmed et al. [3], is to 
employ dynamic sealing. With dynamic sealing we do not do the substitution directly 
but record a fresh variable binding. However, even calculi that satisfy parametricity 
have to compromise on the important gradual guarantee property, or System F syntax, 
or be equiped with heavy forms of runtime evidence [37,20]. A thorough discussion of 
various approaches is given in Section 6. 


2.2 Key Ideas 


Our key design decision is to give up support for parametricity in exchange for a simpler 
calculus that is also easier to extend with other important practical features. In partic- 
ular, in our work we illustrate how to obtain a polymorphic gradually typed calculus, 
with gradual references and with the gradual guarantee. In contrast, none of the exist- 
ing gradually polymorphic calculi supports references and the gradual guarantee is only 
supported with restrictions [20]; or major modifications in the syntax and semantics of 
the language [24]; or not supported/proved at all [37,3,17]. 


G 
gpr 
tics by using a (TDOS) [15] approach. In Aar type annotations are operationally rele- 
vant and they basically play a role similar to casts. Nevertheless, implicit casts should 
also be enforced for a gradual calculus at runtime. Most previous work makes the im- 
plicit casts explicit via the elaboration process. That is the reason why dynamic se- 
mantics is not defined directly. We resort to bidirectional typing with inferred (=) and 
checked (4) modes. Using the checking mode of bidirectional typing, the consistency 
(~) between values and the checked type is checked and enforced via an implicit cast. 
At compile time, the flexible consistency relation allows more programs to be accepted, 
while the checking mode signals casts that are needed at runtime. For example, in the 
typing rule for applications. 


A direct semantics with a TDOS. Our gradually typed calculus 47,, has a direct seman- 


2; rre > Aj > Ad X; rr ke = Ay 


X; Tr kee > A2 


TYP-APP 
The checking mode signals an implicit cast for the argument. The argument e2 is checked 
to be consistent with the type A; using the bidirectional subsumption rule: 


X; T re> B r- -B~A 
STreea 


TyYp-sIM 


For instance, (Ax.x : Int > Int) (True : x) type-checks, but at run-time the invalid cast 
to the value argument (True : x) is detected and an error is reported. 
G calculus is a con- 
Spr 


servative extension of its static counterpart. Notably, our AS r is a simple polymorphic 


Conservativity, no parametricity and direct substitutions. The A 
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calculus, without using mechanisms such as dynamic sealing and evidences. Instead, 
since parametricity is not a goal, we can simply use direct type substitutions during 
reduction as follows: 


((AX.e: A): YX. B) C > eX HC]: A[X > C]: BIX > C] 


Our type application rule substitutes type directly unlike in previous work with dynamic 
sealing where a fresh type name variable is generated and stored in a global or local 
context. Dynamic sealing takes extra time and space. With a large enough number of 
type applications, the space consumption may go unbounded. 

Gradual guarantee and references. Furthermore, A r is mechanically formalized and 
shown to have the gradual guarantee. Our application of the eager semantics and the 
choice of value forms for Na simplify the gradual guarantee. To prove the gradual 
guarantee we need a precision (E) relation. The gradual guarantee theorem needs to 
ensure that if the more static program does not go wrong, then the less static program 
should not go wrong as well. The precision relation is used to relate two programs, 
which have different type information. Type precision compares the amount of static 
type information for programs and types. A type is more precise than another if it is 
more static. The unknown type (x) is the least precise type, since we do not have any 
static information about that type. Let’s consider two programs: 


Ax. 1: Int > Int 
AX. lik >x 


The first one is more precise than the second one because the second program is fully 
dynamic. The value forms of ae are annotated and include terms such as i : Int and 
(Ax.e : A — B) : C. The simplicity of the proof of the gradual guarantee is greatly 
related to the choice of representation of values. In Mais the gradual guarantee theorem 
can be formalized in a simple way with a lemma similar to a lemma proposed by Garcia 
et al. [12]. The lemma states that if e; is more precise than ez and e; takes a step to ef 
then ez takes a step to e/ and e{ is more precise than e}. With this lemma, we can infer 
that two expressions related by precision have the same behavior. Thus, this lemma is 
enough to obtain the dynamic gradual guarantee. Notably, Ape is extended with mu- 
table references using a form of variant consistency [14,38]. This is in contrast to the 


previously discussed gradually polymorphic calculi where references are not supported. 


3 The A,,, Calculus: Syntax, Typing and Semantics 


In this section, we will introduce the A,,, calculus, which is a calculus with references 
and polymorphism. A,,, calculus is an extended version of System F with references and 
is the static calculus that serves as a foundation for the gradual calculus in Section 4. 


3.1 Syntax 


The syntax of the A,,, calculus is shown in Figure 2. 
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Syntax 

Types A,B := Int | A > B | X | VX.A | Unit | Ref A 

Expressions ez=x |ilAx:A.e|e:A|e,e.| AX.e|eA |!e| e; := e | refe | unit | o 
Values viu=i| AX.e | àx: A.e | unit | o 

Contexts Tiss ERS ANGA 

Stores H=: |4,0 =v 

Locations 2 =| AoA 

Frame Fr=voloe|oAl!ol|y:= = e|refo|o:A 


Fig. 2: Agpr Syntax 


Types. Meta-variables A, B range over types. Types include base types (Int), function 
types (A — B), type variables (X), polymorphic types (VX. A), the unit type Unit and 
reference types Ref A, which denotes a reference with type A. 


Expressions. Meta-variables e range over expressions. Most of the expressions are 
standard: variables (x), integers (i), annotations (e : A), applications (e; e2), type ap- 
plications (e A), dereferences (!e), assignments e; := e2, references (ref e), unit (unit), 
locations o, lambda abstractions (Ax : A.e) (which are annotated with input type A), 
and type abstractions (AX. e). 


Values. Meta-variables v range over values. A raw value is either an integer (i), a type 
abstraction (AX. e), a lambda abstraction (Ax : A. e), a unit (unit) or a location (0). 


Contexts, stores, locations and frames. The type context I” tracks the bound variables 
x with their types and the bound type variables X. Typing location X tracks the bound 
locations o with their types, while the store u tracks locations with their stored val- 
ues during the reduction process. Frames (F) include applications, type applications, 
dereferences, assignments and references. 


3.2 Type System 


Before introducing the type system, we show the well-formedness of types at the top of 
Figure 3. The well-formedness of types ensures that there are no free type variables and 
that each type variable is bound in the contexts. 


Typing relation. The typing relation of Agp is shown at the bottom of Figure 3. The type 
system essentially includes the usual System F rules, except that they also propagate the 
location typing context (X). Reference locations o are stored in the location typing con- 
text X (rule styp-Loc). The bound type of locations indicates the type of stored values. 
For instance, o points to 1 stored in a memory cell. The integer type for 1 is tracked by 
the location o in the location typing context X. Other rules related to references such as 
assignments (rule sTyp-AssIGN), references (rule sTyp-REF) and dereferences (rule sTYP- 
DEREF) are standard. Annotation expressions (e : A) are not necessary for the static 
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THA (Well-formedness of types) 
TW-vAR TW-ARR TW-ALL TW-REF 
TW-Int TW-onir Xer T-A Fes T,X +A THA 
T+ Int T + Unit rX Tr-A>B Tr + YX.A T + Ref A 
ST +t,e:A (Typing rules for expressions) 
STYP-LIT STYP-UNIT STYP-VAR STYP-LOC 
x:A ET o:AES 
O30 +, i: Int X; r +, unit : Unit X; T Ax: A +30 +o: Ref A 
STYP-ASSIGN 
STYP-REF STYP-DEREF X; T t, e; : Ref A 
X; Tr r e:A X;r +, e:RefA X; rke: A 
X;T +, refe: Ref A X; r e: A 3D by e = e : Unit 
STYP-APP 
STYP-ABS X; I Fse: Ay >A STYP-ANNO 
&;T,x:Are:B X; I ts e: Ay L;Tt,e:A 
LS; ts Ax: A.e:A—>B D3T ts ee: A2 X;Tts(e:A):A 
STYP-TABS STYP-TAPP 
S;T,Xt,e:A Ita X;T+,e:VX.B 
X3;I ts AX.e:VX.A X; Hs eA: BIXb A] 


Fig. 3: The type system of A,,, calculus. 


system where the annotated types are syntactically equal (rule styp-ANNo), but they will 
play an important role in the gradual system and are included here. 

Definition | defines well-formed stores (u) with respect to the typing locations 2, using 
the typing relation: 


Definition 1 (Well-formedness of the store with respect to +). 
S++ u =if dom(u) = dom(2) and X;-+ po): X(0), for every o €p 


A store is well-formed with the typing location if the store and the typing location 
contain the same domains. For each location, which is in the store, the bounded value 
(o) can be inferred with the type bound in the typing location (2(0)). 


3.3 Dynamic Semantics 


The operational semantics for the gpr calculus is shown in Figure 4 (we ignore the 
gray parts for now). u;e —> p,e’ represents the reduction rules, which states that e 
with store u reduces to e’ with the updated store yu’. The reduction rules of Apr are 
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He >, p'e (Operational semantics) 
STEP-EVAL ere 
wie, Wiel S STEP-ASSIGN 
p; Fle] >, w; Fie] Gv: A :A >, u;v A 40 := v >, ufo & v]; unit 
rerne STEP-DEREF 
O=VEuU 
H; (AX. e) :VX.A) A, ps (e[X > A]) 2 (A[X  A]) lo, vA 
STEP-REFV 
STEP-BETA 
o¢u 
Us ((Aax:A.e) :A>B)v >, welxHv] :B :B pref v >, 4,0 = v; 0 


Fig. 4: Reduction rules for Agp,. 


straightforward. A reference value is bound in the store by a fresh location as shown 
in rule sTEP-REFV. The dereference rule extracts the bound value of the location in the 
store (rule STEP-DEREF). Rule sTEP-EVAL evaluates the frames. Let’s see how the example 
o; := (AX.(Ax : X.x)!02) Int with the existing store o} = 1,02 = 2 reduces. 2 is 
read from store 0; = 1,02 = 2. After the type substitution, 2 is substituted into the 
lambda. Then 2 is used to update the store pointed by 0. Finally, the store becomes 
0, = 2,02 = 2. The detailed steps are as follows: 
0, = 1,0 = 23.0; := (AX. (Ax: X. x) !02) Int 
<> {by rule STep-EvaL, rule STEP-DEREF } 
0, = 1,02 = 230, := (AX. (Ax : X. x) 2) Int 
<> {by rule STEP-TAP } 
0, = 1,02 = 23.0; := (Ax: Int.x)2 
<> {by rule STEP-BETA} 
0, = 1,0 = 230) := 2 
<> {by rule STep-assicn} 
0, = 2,02 = 2; unit 
Theorem 1 shows that the ,,, calculus is deterministic: 
Theorem 1 (Determinism of A,,,). If X; Hs e : A, X H p, Use >s M13 €; and pe >s 
[23 e2 then e; = e and u = p2. 
Furthermore, the preservation Theorem 2 and progress Theorem 3 of Apr calculus are 
shown below: 
Theorem 2 (Type Preservation of 2,,,). If X; - Fs e : A, X + wand p;e >s p’;e’ then 
2’; Hse: A, X} pw’ and ds’ D &. 


Theorem 3 (Progress of A,p,). If X; Fs e : A then e is a value or Ae'p’, p,e >s p; e. 
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Typing modes ens sj 
2X; T kee SA (Typing rules for expressions) 
STY-LIT STY-UNIT STY-VAR STY-LOC 
x:AeTr o:AEy 
X; r e,i = Int &3T k, unit > Unit S;Te,x DA X; r e0 => Ref A 
STY-ASSIGN 
STY-REF STY-DEREF 23D Ese, > Ref A 
Z; Tke e >A X; r t,e = Ref A S;T Fe =A 
»;l§, refe => RefA X; T kele > A X; r tese := e > Unit 
STY-APP 
STY-ABS XT E, e > Ay > Ad STY-ANNO 
3I,x:Atre>B X; rI Es &2 & Ay X; r Eke €A 
2; kt, Ax:A.e > A—>B X; I Esee > A S;Tre: ADA 
STY-TAPP 
STY-EQ STY-TABS rea 
ZT kees A 2; T,X Eee >A 2; kte => VX.B 
S;Te,eeA +;T&, AX.e > VX.A 3; t,e A > BIX |> A] 


Fig. 5: Bidirectional typing for the A,,, calculus. 


3.4 Bidirectional Typing 


We also present a set of bidirectional typing rules (shown in Figure 5) for Apr. Although 
bidirectional typing is not essential for A,,,, itis used later for the gradual typing criteria 
proofs. The typing judgment is represented as X; r + e © A. The expression e is 
inferred (=) or checked (<=) by type A under the typing context I and location typing 
context X. Typing modes (©) contain the inference mode (=>) and checking mode (<), 
which are shown at the top of Figure 5. One extra rule is rule sTY-EQ, which switches 
modes. We proved that the two type systems are equivalent: 


Lemma 1 (Typing Equivalence for 2gp). X; I tse: Aif X; r kse © A. 


G 
4 The 4, Calculus 


This section introduces the Ayr calculus, which gradualizes the Apr calculus. Normally, 


a gradually typed lambda calculus (GTLC) does not define the operational semantics 


directly, but is elaborated to a cast calculus. Nie instead defines the dynamic semantics 


directly using the TDOS approach [15]. a is proved to be type sound and it has a 
gradual guarantee. The calculus does not have parametricity, enabling simplifications 
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Syntax 
Types A,B := Int | A > B | X | VX.A| Unit | Ref A | x 
Expressions ez=xlile:Ale,e.|eAl!le|e, := e | refe | unit] o | AX.e : A |Ax.e : A —> B 
Results r := e | blame 
Raw Values u =i | AX.e : A |Ax.e : A —> B | unit | o 
Values vi=u:A 
Contexts I :=.|T,x:A|T,X 
Stores e E 
Location &:=.|2,0:A 
Frame Fr=voloe|oAl!o|y := := e | ref 
TtrAa~B (Consistency) 
S-VAR S-DYNL S-DYNR 
S-UNIT Pex S-z rea rea 
T + Unit ~ Unit Tt+X~X T+ Int ~ Int Ttre~A Ttra~k 
S-ARR 
Tra, ~ B S-FORALL S-REF 
T+ A~ Bo T,X- -A~B TtA~B 
TtaA; >A ~B > By T+ YX.A ~ VX.B T + Ref A ~ Ref B 


. : G . 
Fig. 6: 4gp Syntax and consistency. 


in the calculus, and the addition of features such as gradual references, which none of 
the previous gradual calculi with polymorphism support. 


4.1 Static Semantics 


Syntax, type well-formedness and consistency. Figure 6 shows the syntax and consis- 
tency of the Avr calculus. The gray parts are the same as 4gpr. The Aer calculus extends 
types with the unknown type * with respect to Apr. Because of the power of the un- 
known type *, dynamic type checking is required and run-time errors may be raised. 
Therefore, in addition to expressions, Ae has the run-time error blame. Because of 
the run-time checking requirement for the gradual typing system, we need annotations 
for type abstractions and lambda abstractions. Furthermore, due to the imprecision of 
the unknown type x, values are also annotated. Otherwise, examples such as 1 : x are 
troublesome. Because of the value forms, annotations are not included in frames, unlike 
in the A,,p, calculus. We will explain the details later. 


Well-formed types are extended with the following rule for the unknown type *: 


Tt. 


Notably, instead of syntactic equality, a more general relation called consistency (I + 


A ~ B)is defined in AC Every well-formed type is consistent with itself. The unknown 
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Z; Tres A (Typing rules for expressions) 
TYP-LIT TYP-UNIT TYP-VAR TyYP-Loc 
x:A ET o:AES 
X; Tr +i = Int Z;r + unit > Unit Tr ex> A X;r +- o > RefA 
TYP-ASSIGN 
TYP-DEREF A, > Ref A 
TYP-REF A > Ref A X; T be, => A, 
X; Tr res A S;Tre> A; X; Trke A 
+; + refe => Ref A Z; T h'e > A XZ; rt e :=e, > Unit 
TYP-APP 
ADA; > Á, 
TYP-ABS X;Tre >A TYP-ANNO 
X; T,x:AHe & B X; rre =A; X; Tr re &A 
X; T +- Ax.e: A> B> A>B X; T k ee > A X; T re:ASA 
Typ-sIM TYP-TAPP 
TtA~B TYP-TABS Tra D:D te => A 
XT res A X; T, Xe A A, > YX. B 
S;Tree=B X;T + AX.e:A > VX.A S;TtreA > BIXPrA] 
Ap A; >A A >œ YX. A; A >œ Ref A; 


A—>B>A>B VX.A > YX.A Ref A > Ref A 


kok Ok * > VX. x * >œ Ref x 


Fig. 7: The type system for the AS pr calculus. 


type is consistent with any other well-formed type. Structural types such as functions, 
references and polymorphic types are consistent if their type sub-components are con- 
sistent. Note that for two reference types, consistency is variant: if A and B are consis- 
tent then Ref A and Ref B are consistent. Unlike invariant consistency [31], type A and 
B do not have to be the same. As usual, consistency is reflexive and symmetric, but not 
transitive. We use the following abbreviation for consistency: A ~ B=: + A ~B. 


Typing relation. Bidirectional typing is used to design the type system. Bidirectional 
typing is not essential for Agpr but it is necessary for ee Annotation expressions (e : A) 
and the checking mode (<) signal the use of casts (explicitly or implicitly) at run-time. 

The typing rules of the Ae calculus are shown in Figure 7. They are almost the same 
as Agpr 8 type system. For rule Typ-app, rule Typ-rapp, rule Typ-assicn and rule Typ- 
DEREF, the unknown type * can be matched with, respectively, a dynamic function type 
(* — *), a dynamic polymorphic type (VX. x) and a dynamic reference type (Ref x). 


In a system with gradual typing and the unknown type * we always have to consider 
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cases where the type may be unknown. For instance in an application e; e2, e} can 
infer a function type as usual, but it can also infer type * and still be well-typed. So, a 
matching function (A > B) is needed to account for both possibilities. The table at the 
bottom of Figure 7 shows the definition of the matching functions A > B. Note that we 
overload the notation, but there are 3 different matching functions, in each column of 
the table, that are employed by the rules correspondingly. For example, rule Typ-DEREF 
employs the matching function in the third column of the table. The first row in the table 
depicts the form of the matching function, while the other two rows give its definition. 

The checking mode rule Typ-sim is generalized to check if the inferred type A and 
checked type B are consistent. Note that rule Typ-sim is the only rule in the checked 
mode and, as such, does not overlap with anything else. Moreover, all the rules in the 
inference mode are syntax directed. Therefore, the rules are basically directly imple- 
mentable, as usual for bidirectional type-checking rules. Note that in AS, annotation 
expressions combined with consistency play an important role, where more programs 
are allowed. For instance, (Ax. ((x : *) 1) : Bool — x) True is accepted, but raises a 
blame error at run-time. Note that dynamically typed lambdas Ax.e are syntactic sugar 
for Ax.e : x — x. The use of this syntactic sugar enables us to encode the dynamically 
typed lambda calculus (DTLC) [4] easily in Ae 

Definition 2 shows dynamic type checking for raw and annotated values, which 
is done at run-time. Dynamic type checking for values exploits the annotations that 
are present at run-time, and does not make use of the typing relation. Dynamic type 
checking is essentially a constant time operation, with little cost (note that the function 
is not recursive). 


Definition 2 (Dynamic type). |u|, = A and |v|,, = A denote the dynamic type of the raw 
and annotated values. 


il, = Int 
(Ax.e: A > B), =A > B 
(AX. e : A)|, = YX. A 
junit), = Unit 


ol. = Ref |v, wheno=veé pu 
(u : Alı =A 


|u|,, = A states that the dynamic type of the raw value u is A under store u. Notably, 
for locations o, the dynamic type is defined by the dynamic type of the bounded values 
in the store. Other rules are straightforward. Lemma 2 shows that if a raw value can be 
inferred with type A, then its dynamic type is type A as well. 


Lemma 2 (Synthesis of Dynamic Types). For any raw value u, if X + u and X;: + 
u = Athen |ul, = A. 


As in Agp-, a term typed using the inference mode is guaranteed to infer a unique 
type. In addition, Lemma 3 shows that each well-typed term can be checked. 


Lemma 3 (Synthesis principality). fX; r + e = A then exists B, +; +e <= Band 
TtraAa~bB. 
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Hv œa usr (Casting for values) 
CASTING-SIM CASTING-NSIM 
July ~B aul, ~B 
Mu: Apg uu: B Hsu: A >p p; blame 
IV Opa lGr (Double casting) 


TLisTs-coNns 


TLISTS-BASEB LV >a H; V 
Lv 2, p; blame LV Ser 
LV Sea H; blame GV Opa ir 


Fig. 8: Casting for values 


4.2 Dynamic Semantics 


The dynamic semantics contains two parts. The first part is casting, which casts a value 
to another value with a target type. In casting the dynamic type of the value is the source 
type. The second part is the reduction rules. 


Casting. Figure 8 shows the casting rules of the Aes calculus. u;v “4 pr repre- 


sents casting values v by type A under store u. The dynamic type of the raw values 
u is checked to be consistent with type A or not. If two types are consistent, then the 
intermediate type can be removed and the raw values are annotated with target types. 
Otherwise, a run-time error is raised. For example when 1 : * is cast by type Bool, the 
dynamic type of 1 is Int, which is not consistent with Bool, and blame is raised. While in 
1 : x cast by type Int, the type Int is consistent with type Int. Thus, type * is erased and 
1 is annotated with type Int. Since a location o is a raw value, if we want to obtain the 
dynamic type of the location, we should obtain it from the store u. Therefore, casting 
uses the store. Casting by two types is shown at the bottom of Figure 8. It simply casts 
the types one by one, using the basic casting relation. 


Reduction. The reduction rules of a calculus are shown in Figure 9. Raw values 
are reduced to become values, which are annotated by the dynamic type of the raw 
values with rule step-u. Due to this rule, annotations are not included in the frame. An- 
notated expressions are further dealt by rule sTep-ANNo and rule sTEP-ANNop. From the 
typing rules of rules Typ-app, Typ-Tapp, Typ-AssiGN, and Typ-DEREF, type * is allowed 
to match, respectively, a dynamic function, a polymorphic function or a reference type. 
Moreover, we know that x is consistent with any type. Therefore, we should check 
whether the internal values cannot match with the wanted type structure. For example, 
ill-formed applications ((1 : *) 2) where the internal value (1) is not an lambda abstrac- 
tion. There are similar examples for type applications and assignments: (1 : x) Bool 
and (True : x) := 2 where 1 is not a type abstraction and True is not a location. Using 
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4e => p;r (Operational semantics) 
VSTEP-EVAL VSTEP-BLAME VSTEP-ANNOP 
pe => p; e 4; e => p; blame ue => p; blame 
us Fle] 3 w; Fie] us Fle] © yp’; blame use: A p; blame 
VSTEP-BETA 
Ab A > B: VSTEP-ASSIGND 
[EV ay Ay V H; vi rer x H; blame 
4; ((Ax.e: Ay > By): A)v > p; efx > v’]: Bi : By H; Vi ¿= v2 > p; blame 
VSTEP-BETAP 
VSTEP-ANNOV Ap Az >B VSTEP-BETAD 
IV, Gr LV >a, Hs blame H; Vi asx H; blame 
Iv: A> wr 4; ((Ax.e: A; > Bi): A)v > u; blame H; vı v2 > p; blame 
VSTEP-TAP VSTEP-TAPD 
B > YX. B, H; v Syx.« H; blame 
Us ((AX.e : A): B) C > p;ef[X | C]: A[X HC]: BLX > C] Lv B => p; blame 
VSTEP-REFV VSTEP-DEREF VSTEP-DEREFP 
o¢u o=vE€H A, > Ref A H; V rer x H; blame 
pref v > 4,0 = v; 0 bso: A) => p; v: A Ls lv > p; blame 
VSTEP-ASSIGN VSTEP-ASSIGNP 
lol; = At A > Ref A2 lol, = At A > Ref Ay VSTEP-U 
Hi V2 Ay Ay HS V5 H; v2 S1, a H; blame lul, =A 
us (o0 : A) := v > ulo & V5]; unit 4; (0 : A) := v2 > u; blame Hu p;u:A 
VSTEP-ANNO 


avaluee:A 
7 Pe A 
merase 


me:Acw;e:A 


Fig. 9: Reduction rules for AS r 


rules VSTEP-BETAD, VSTEP-TAPD, VSTEP-DEREFP, and VSTEP-ASSIGND, We cast the value to 
the corresponding dynamic types and filter out programs with errors. To apply a value 
to a functional value (rules vsTEP-BETA and vVSTEP-BETAP), the argument type must be 
consistent with function input types A2. Moreover, the expected substituted value type 
is A. Thus, the argument value should be cast by Az and A;, which may return a blame 
error. To preserve the type, the substituted body is annotated with Bı and B2. When a 
value v is annotated with a type A, the type of the value must be consistent with type A, 
and run-time checking is needed to validate consistency (rule vsTEP-ANNov). A reference 
value ref v is bound in the store with a fresh location o (rule vsTEP-REFV). To obtain a 
value from the store by the location, from the last expression we use rule VSTEP-DEREF. 
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Note that in the typing rule for references: 


X3-'o:A, > A, A; > Ref A 
Z;-+l(0:A;) > A 


TYP-DEREF 


The expected type is A but the bound value type is consistent with A. Thus we annotate 
v using type A. When assigning a value to replace the bound value in the reference using 
rules VSTEP-ASSIGN and VSTEP-ASSIGNP : 
A> Ref A> S;to:Avaa Ask wW EH A 
Ž;- + (0: A):= v => Unit 


TYP-ASSIGN 


The bound value by location o has type A,, while the type of v2 is consistent with type 
A» and Az is consistent with A;. The expected type to be replaced is type A4, therefore 
v2 is cast by type A; and A2. Note that the cast result can be blamed. If a type is applied 
to a polymorphic value, from the last expression (rule vsTEP-TaP): 


B > VX. B2 2;:- + (AX.e:A):B > B 
2X; (AX.e : A): B) C => B[X => C] 


TYP-TAPP 


The expected type is (B2[X + C]) but the substituted expression (e[X => C] : A[X |> 
C]) has type (A[X => C]), so it is annotated with type (B2[X > CJ). 


G AC 


Properties of Agi. Agpr 


Theorem 6). 


is deterministic (Theorem 4) and type sound (Theorem 5 and 


Theorem 4 (Determinism of 2°, 


gpr)» IfX;-te © A, pe > pm;r and p,e > m; r 
then rı = r and [ly = p2. 


Theorem 5 (Type Preservation of AȘ). fX; +e © A, X- u and p,e pe 


then X; e © A, X' ty’ andd” D Z. 


G 


Theorem 6 (Progress of 4%, 


IX; e & Athen e isa value or Ar p, p,e > p'r. 


4.3 Gradual Typing Criteria 


Siek et al. [31,32] proposed a set of criteria for gradual typing system. At the end of the 
spectrum, a fully annotated gradually typed program should behave as a statically typed 
program. Conversely, a gradually typed program without annotations should behave as 
a dynamic program. Siek et al. proposed the gradual guarantee, which states that having 
annotations that are more/less precise should not change the behavior of the programs. 
Here we show that AGr has the gradual guarantee. 

To prove the gradual guarantee, we define the precision for types, expressions and 
stores. At the top of Figure 10 is type precision A E B, which states that type A is 
more precise than B. The unknown type x is less precise than any other types. Each 
type is more precise than itself. The precision of functions, polymorphic functions and 
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ACB (Type Precision) 
TP-ARR 
TP-UNIT TP-VAR TP-Z TP-DYN A EB, AE B 
Unit c Unit XEX Int E Int ALEx Ai > A&E Bı > By 
TP-FORALL TP-REF 
AEB AEB 
VX.AEVX.B Ref A E Ref B 
e, Ce (Expression Precision) 
EP-REF EP-DEREF 
EP-LIT EP-VAR EP-UNIT EP-O gee aE & 
iCi xx unit E unit oTo ref e; E ref e2 le; Ele 
EP-ABS 
eE e EP-APP EP-ASSIGN 
AEA Bi E B2 e E e3 e2 E €4 e E 63 e2 E €4 
Ax.e; : A; > Bı E Ax. e2 : A > Bo ej e2 E 63 €4 ej := e62 E 63 := eg 
EP-ANNO EP-TABS EP-TAPP 
eCe A EA eCe AEA eCe AEA 
e: A, Ee:A2 AX.e, : A; E AX. e2 : A2 e, Ay E e A 
Hı Epo (Store Precision) 
SP-EMPTY 
SP-NIL 


Hı E po vi E v2 


H10 = vı E fy,0 = v2 


Fig. 10: Precision Relation. 


reference types holds, if the precision of their sub-components holds. Note that the 


precision of function types is “covariant” in the argument types since to compare the 
precision of the two programs: 


Ax. 1: Int > Int 
Ax.1:* — Int 


we should just say that the first one is more precise than the second one because the 
input type of the second one is fully dynamic. Expression precision is shown in the 
middle of Figure 10. The rules can mostly be derived from the type precision. Each 
expression is in a precision relation with itself. Structural expressions are in a precision 
relation if their sub-expressions are related. Lastly, store precision, shown at the bottom 
of Figure 10, shows that precision holds if the precision of values in the store holds. 
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We Ss M'e (Operational semantics) 
STEP-EVAL 
Use oe u;e STEP-ANNOV STEP-ASSIGN 
p Fle] Os. ws Fie] pju: A: As, wu:A 40 := v Os, ulo | v]; unit 
STEP-DEREF 
STEP-TAP devei 
H; ((AX.e : A): YX.A) A Ss u e[X BH A]: A[X > A] H; !0 =s H; v: A 
STEP-REFV 
STEP-BETA ogu 
4; ((Ax.e : A > B) : A > B) v >s welxHv]):B:B Href v Os, H, 0 = V; 0 
STEP-ANNO 
STEP-U avaluee:A 
lul, =A HC on We 
MU =s uA Me: As Wye A 


Fig. 11: Reduction rules for Agp-. 


Static criteria. We show that the full static type system of ee is equivalent to the 


Agpr calculus (Theorem 7). We use s to denote a relation from the static system in case 
of ambiguity. Theorem 8 shows the static gradual guarantee of AS r- If a more precise 
program is well-typed then a less precise program should be well-typed with a less 


precise type. 
Theorem 7 (Equivalence for 2;,, (statics)). If -;- ks e © A ifandonlyif-;-te © A. 


Theorem 8 (Static Gradual Guarantee). If e; E e2, -H e © Athen; re © B 
and AE B. 


Dynamic criteria. Theorem 9 says that fully static programs of Aar calculus behaves in 
the same as the A,,, at run-time. To make the proofs easier, the reduction rules of Agp, 
calculus have extra annotations to follow A (we denoted as s+). It means that there 
are extra identical annotations, as shown in the gray parts of Figure 4. However, these 
annotations are identical and they can be removed without affecting the final reduction 
result. In addition, as in Apne values have annotations; raw values should step to be 
annotated values; and annotations are not included in Frames. This requires a few extra 
rules, which are shown in Figure 11. 

Notably, AGr has the dynamic gradual guarantee (Theorem 10). The proof is simple 
in comparison to the original proof by Siek et al. [32]. This simple theorem is formalized 
following the work of Garcia et al. [12]. It says that if a more precise program with a 
more precise store can reduce, then the less precise program with a less precise store can 
also reduce. Furthermore, their resulting programs and stores should keep the precision 


relation. 
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Theorem 9 (Equivalence for 2,,, (dynamic)). V -;-F,;e © A, 


— [fuse se We then p,e > 3 e’. 
— [fuse p; e then pe >s w'e. 


Theorem 10 (Dynamic Gradual Guarantee). If e} E e2, mw, E wo, 3: H e © A, 
-He & Band m;e; > piei then there exists e, and u, such that p2; e2 > ph; e, 
, e; E e, and p E W 


5 Discussion 
In this section, we briefly discuss alternative designs and possible extensions. 


Preserving relational parametricity. An alternative design is to have a directed seman- 
tics gradual polymorphism calculi, which preserves parametricity. We employ the eager 
semantics similar to the AGT methodology, which is applied in the GSF calculus. Toro 
et al. [37] analyzed the following example to show how parametricity is broken by the 
naive use of the dynamic sealing in the eager semantics: 


(AX.(Ax : X.let y: x =x in letz: x =yinz+1)) Intl 


The polymorphic function with type (VX.X — x) breaks parametricity, which should 
be detected at run-time and raise an error. However, the application of the function 
reduces to 2. A fresh name variable œ is generated and is bounded to the type Int. 
Variable x to y is flowing from type Int to type a; y to z is flowing from type x to 
type x; and x to z is flowing from Int to x. Any of these type flows are safe. Thus the 
reason for the loss of parametricity is related to the loss of precise type information. 
Consequently, dynamic sealing is not enough to enforce relational parametricity. For 
the above example, GSF detects the error by the refining evidences such as ({a"'!, a”*)), 
Importantly in the type flow from y to z, more precise types (Int and a’) instead of 
x and x are obtained, so when moving from x to z the type changes from Int to a!” 
When doing the addition, the run-time error is detected since the flow from a to Int 
is not defined. A potential approach for us is to use tracked types (A<#!-82>), which are 
similar to the refined evidences in the GSF calculus. Because A is a source language, 
we do not have evidences, thus a possible approach is to record information in types. 
For the above example, tracked types can track the unknown type with more precise 
types from y to z to be Int and a!” which is *'""") and then from x to z to be xte”) 
as the refined evidences and a run-time error is detected when doing the addition. 


A space-efficient gradual polymorphic calculus. Ozaki et al. [27] explored the space 
efficiency problem in the gradual polymorphic calculus. They extended the coercion 
calculus (AC) [29] with parametric polymorphism (called AC’). Dynamic sealing was 
applied in AC” to enforce relational parametricity. Consequently, a sequence of coer- 
cions is allowed and they showed that it cannot be normalized to a smaller coercion. 
In other words, the size of sequences is unbounded. Notably, they stated and proved 
that ACY cannot be space-efficient when dynamic sealing is supported. Furthermore, 
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they conjectured that the gradual polymorphic calculus with dynamic sealing cannot 
become space-efficient. Our AS calculus substitutes types directly, as the traditional 
semantics without employing dynamic sealing. Moreover, the eager semantics is ap- 
plied. Thus we believe that it is possible for our AS pr calculus to be a space-efficient 
gradual polymorphic calculus. Two tentative and promising rules are as follows: 


A~C ~A ~C 
e:A:B:C=~e:A:C e:A:B:C = blame 


With the above two rules, annotations are removed or an error is raised, to achieve 
the space-efficient goal. Surprisingly, with these two rules, it seems possible to have a 
space-efficient gradual references calculus naturally. We intend to explore this in the 
future. 


Implicit polymorphic references. Implicit (higher-rank) polymorphism [10,26,19] is 
pervasive in theoretic and practical programming languages. Existing gradual polymor- 
phic calculi are mainly explicitly polymorphic. One exception is the work of Xie et 
al. [41]. Explicit polymorphism means that polymorphic types are not related to any of 
its instantiated types but in implicit polymorphism, they are related. Xie et al. [41] de- 
signed a source gradual implicit polymorphism calculus with consistent subtyping but 
their dynamic semantics is defined by translating to the well-known polymorphic blame 
calculus (AB) [3] without the proof of the dynamic gradual guarantee. A possible ex- 
tension of Xie et al.’s work is to support implicit polymorphism with a direct dynamic 
semantics, and to explore the dynamic gradual guarantee and parametricity properties. 
However, it is well-known that a naive combination of implicit polymorphism and ref- 
erences lead to an unsound language. A possible solution is to limit polymorphism to 
syntactic let-bound values as adopted by Standard ML [40]. 


Alternative forms of values. In our calculus, all values are annotated, such as 1 : Int or 
(Ax.x : Int > Int) : Int — Int. This introduces some overhead as some annotations are 
redundant. We can have an alternative and workable form of values as follows: 


viu=ulu:*|(AX.e: A): VX.B|(Av.e: A; > By): A2 > Bo 


The above value form removes redundant annotations such as integers (1 : Int). This is 
good for performance, but it would make the proof of dynamic gradual guarantee harder. 
However, the resulting calculus with fewer annotations should have an equivalent se- 
mantics to our calculus, and would be a better candidate for guiding an implementation. 


6 Related Work 


Gradual typing. Gradual typing is a term coined by Siek et al. [31]. The unknown type 
?, which we represent as x, is the new notion introduced to a gradual type system to 
integrate dynamic and static typing. By using the unknown type x, equality on types 
is lifted to consistency. Any type is consistent with type x. Therefore, run-time type 
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checking is needed for a gradually typed lambda calculus. Traditionally, the dynamic 
semantics of a gradual language is defined by elaborating to a target language, which 
includes cast calculi [39,34,29,11,3] and coercion calculi [13,14,30,29,27]. 


Garcia et al. [12] proposed the abstracting gradual typing (AGT) approach, which 
allows for deriving a gradual type system by lifting the static type system. They argue 
about the weakness of elaborating to a target language, and did not resort to a tar- 
get language in their calculus by using intrinsic terms. Our Aer defines the dynamic 
semantics directly without using intrinsic terms, but employing instead an approach 
based on type-directed operational semantics (TDOS). Type directed operational se- 
mantics (TDOS) was proposed by Huang et al. [15] to design calculi with the merge 
operator and intersection types. Ye et al. [42] explored the use of the TDOS in gradual 
typing. In TDOS, type annotations are relevant at runtime and can affect the semantics, 
unlike many traditional calculi where types are not runtime relevant. With a TDOS we 
can design a gradually typed calculus without elaboration to a cast calculus, since the 
semantics can be given directly. Our Ae employs the eager semantics for higher-order 
values following an approach similar to AGT. Ye et al. only consider a TDOS for a sim- 
ply typed, purely functional language. Our work shows that the TDOS approach can be 
extended to important features, such as polymorphism and references. 


Gradual typing with references. Many languages with static and dynamic typing, em- 
ploying some form of optional typing, support references. These include Flow [8], 
Dart [6] and TypeScript [5]. However for optional typing, the run-time checking is not 
performed for fully dynamic programs, leading to unsoundness with respect to the static 
type system. In the work of Siek et al. [31], he already considered mutable references, 
but in a very simple setting without annotation expressions. Furthermore, the gradually 
typed lambda calculus is elaborated to a target language to define the dynamic seman- 
tics. Herman et al. [14] designed a coercion calculus with references, which is space 
efficient. A gradualizer, introduced by Cimini and Siek [9], can derive a gradual static 
type system and cast insertion with references systematically. Toro et al. [38] designed 
source gradual typing system with references A, and a corresponding target language 
Vr using the Abstracting Gradual Typing (AGT) methodology. They designed the 


Vr as a Space-efficient calculus and proved the gradual guarantee. Our Ai is the first 
polymorphic gradually typed language with references. 


Existing gradual polymorphic calculi. In the following we summarize some of the 
solutions to the problem of preserving parametricity and gradual guarantee in gradual 
polymorphic calculi and the changes that these solutions entail. 


Dynamic sealing. Ahmed et al. [3] solved the problem in Section 2 by using dynamic 
sealing, inspired by the work of Matthews et al. [21]. They proposed the polymorphic 
blame calculus [3] (we present it as AB’), which is a widely used cast calculus with 
dynamic sealing. The most interesting construct of AB” is the named type binding vX := 
A.t, which is introduced to record the instantiated type of a type variable. The programs 


162 W. Ye and B. C. d. S. Oliveira 


in Section 2 behave as expected in ABY: 


(K* : x => YX.VY.X > Y > X) Int Int23 
>* vY := Int.vX := Int(2: X > *:* >X) 
=>" 2 

(K* : x > VX.VY.X — Y > Y)IntInt23 
o>* vY := Int.vX := Int(2: X > *«:* >Y) 


<>* blame 


The first program succeeds and returns the first argument. While the second program 
fails, since the polymorphic information is recorded as X := Int and Y := Int in type 
bindings and the original type variable names are preserved in the casts. Notably, for 
higher-order values, ABY follows the lazy semantics as the blame calculus [39,29]. That 
is, for a function value, the checking is delayed until an argument value is applied. This, 
unfortunately results in unbounded space consumption for higher-order casts [13,14]. 

As Xie et al. [41] pointed out, the compatibility relation of ABY mixes explicit and 
implicit polymorphism to some extent, since they employ the following rule: 


A[X th x] <B 
VX.A<B 


This compatibility rule of ABY allows VX.X — X to be compatible with any static 
instantiated types such as Int — Int and Bool — Bool. These types are not related in 
System F so AB” is not a conservative extension of System F. The gradual guarantee 
has not been discussed in AB”, but they show the parametricity property. 


The Fg and Fc calculi. Igarashi et al. [17] improved on AB’. They designed a source 
calculus (Fg) and a target calculus (Fc), which is a conservative extension of System 
F. The dynamic semantics of Fg is indirect and defined by translation to Fc. Fg does 
not relate VX.X — X with static instantiations, but only with the dynamic instantiation 
* — x. The type * — x is called quasi-polymorphic, since it is an instantiation of 
VX.X — X similarly to what happens with implicit polymorphism. However, a type 
such as Int — Int is not quasi-polymorphic. Instead of binding types locally by (vX := 
A.t), they made the type bindings global. Their reduction form X > f ~ 2%” > f’ is 
augmented with a store, which records the bounded type variables X := A. The above 
example reduces in Fc as follows. 


D> (K* : x > VX.VY.X > Y — X) Int Int 23 
>* Ye (AX.AY.K* : x > X >Y > X) Int Int 23 
>* YX := Int, Y := Inte (K* : x > X >Y > X) Int 23 
c>* 2 
Furthermore, they argue that type bindings generated locally lead to run-time overheads. 
Their observation is that type bindings are not required for every substitution, but only 
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for casts with the dynamic type (x). Therefore they employ two kinds of type vari- 
ables, which are distinguished by labels. One kind is static type variables (X::S) and the 
other kind is gradual type variables (X::G). Type application for static type abstraction 
does not generate type bindings, which are only generated for gradual type abstractions. 
Parametricity and the static gradual guarantee are proved, although the proofs are not 
mechanized. However, the dynamic gradual guarantee is left as conjecture. In addition 
their static gradual guarantee is proved with some constraints in the type precision rela- 
tion. In their precision, VX. X — X is more precise than VX.X — x but not YX. x > X. 


The GSF calculus. Toro et al. [37] presented the gradual polymorphic calculus (named 
GSF), which employs the Abstracting Gradual Typing (AGT) methodology. In AGT, 
casting of higher-order values is eager compared to ABY and Fc. This avoids the prob- 
lem of space consumption although, as New et al. [25] pointed out, the 7 principle 
(which ensures V = Ax.Vx in the call-by-value languages) is broken. To preserve para- 
metricity, global dynamic sealing, which does not distinguish between static and grad- 
ual variables, is used. They also refine the presentation of evidence, which witnesses 
the consistency judgement, ensuring that it holds. Instead of simple evidences such as 
(<a, Inty), they employ sealing evidences ((a”, Int)). GSF satisfies parametricity but not 
the gradual guarantee. Importantly, they proved that the gradual guarantee is incompat- 
ible with parametricity. 


Parametricity with the Gradual Guarantee. To achieve both parametricity and the grad- 
ual guarantee, New et al. [24] designed PolyG” calculus which gave up the syntax of 
System F and the users are required to provide different sealing options. They intro- 
duced the sealed syntax as sealy M which explicitly seals terms. With the user-defined 
syntax, the gradual guarantee and parametricity are proved. More recently, Labrada et 
al. [20] improve on GSF. They do not change the syntax of System F but insert plausible 
sealing forms during the elaboration from a gradual source language which is named 
Funk to a target cast calculus. They proved the gradual guarantee and parametricity for 
the target language, but for the source language (Funk), the gradual guarantee comes 
with a restriction for type applications, which can only be instantiated with base and 
variable types. Some of the main theorems are proved in Agda. 


Summary. In order to keep parametricity we need several compromises. For instance, 
we need to use a dynamic sealing mechanism instead of direct type substitution causing 
extra space and time consumption. In many of the earlier calculi, the gradual guaran- 
tee is not obtained. In the later calculi, the gradual guarantee is either restricted or we 
need to give up the syntax of System F. Traditionally, many works on gradual typing 
are based on two different calculi: a source gradually typed language, and a target cast/- 
coercion calculus where casts/coercions are explicit. The dynamic semantics is defined 
by elaborating the source language to the target calculus. In other words, the semantics 
of the gradually typed language is given indirectly via a second, target language. All 
previously discussed works follow this indirect way to give the semantics to a gradually 
typed source language. 

Furthermore, none of the gradually typed polymorphic calculi supports references. 
However, even for a static polymorphic calculus extended with mutable references ob- 
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ABY Fg GSF PolyG” Funk AS 

2011 2017 2019 2020 2022 present work 
Direct Substitution x x x x x v 
System F extension x v vV x v vV 
Direct Semantics x x x x x v 
Parametricity v v vV vV v x 
Gradual Guarantee x x x vV v vV 
Semantics Lazy Lazy Eager Lazy Eager Eager 
Mechanized Proofs x x x x v v 
References x x x x x v 


Table 1: Comparison among gradual polymorphism calculi. A x denotes no. A V de- 
notes yes while “ denotes partial yes. 


taining parametricity is highly non-trivial. As Ahmed et al. [2] stated: “combing muta- 
ble references with polymorphism can be extremely tricky.’ From the analysis of Jaber 
and Tzevelekos [18], we know that naively moving from a polymorphic calculus to 
incorporate with mutable references, breaks parametricity. The reason is that common 
references can be instantiated with differently typed variables. Therefore, extending a 
gradual polymorphic calculus with the mutable references is non-trivial, and none of 
the existing gradual languages with polymorphism support references. 

Table | summarizes several features and differences in existing gradually polymor- 
phic calculi. 


7 Conclusion 


In this paper, we design a static system A,,, with polymorphism and references and its 
gradual counterpart Age ae has a direct semantics without resorting to a cast calculi. 
In ee the gradual guarantee is proved but we give up parametricity. In exchange, our 
calculus can be simplified, since sophisticated mechanisms such as dynamic sealing are 
not needed. Our calculus follows the original semantics of System F, based on direct 
type substitutions, avoiding extra space and time complexity that is necessary by mech- 
anisms such as dynamic sealing. In the future, we could try to find out if there is a way 
to keep both gradual guarantee and relational parametricity for the source language, or 


explore more efficient formulations of ae 
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Abstract. Intermittent computing is gaining traction in application do- 
mains such as Energy Harvesting Devices (EHDs) that experience arbi- 
trary power failures during program execution. To make progress, pro- 
grams require system support to checkpoint state and re-execute after 
power failure by restoring the last saved state. This re-execution should 
be correct, i.e., simulated by a continuously-powered execution. We study 
the logical underpinning of intermittent computing and model check- 
point, crash, restore, and re-execution operations as computation on 
Crash types. We draw inspiration from adjoint logic and define Crash 
types by introducing two adjoint modality operators to model persistent 
and transient memory values of partial (re-)executions and the transi- 
tions between them caused by checkpoints and restoration. We define 
a Crash type system for a core calculus. We prove the correctness of 
intermittent systems by defining a novel logical relation for Crash types. 


Keywords: intermittent computing - modal Crash type - logical relation 


1 Introduction 


Intermittent computing is gaining importance in application domains that re- 
quire inaccessible or large-scale device deployments, such as wildlife monitor- 
ing [28], tiny satellites [22,29], or smart civil infrastructure [1]. As battery main- 
tenance may be infeasible in these environments, programs can instead run on 
batteryless Energy Harvesting Devices (EHDs). An EHD can run solely off en- 
ergy harvested from its environment, at the cost of being powered intermit- 
tently. The device harvests energy (e.g., via solar panel) into a re-chargeable 
buffer. Once the energy buffer is full, the device turns on and begin to compute, 
consuming the stored energy. When the buffer drains, the device turns off at 
an arbitrary location until it can recharge and repeat this operational cycle. A 
power failure erases volatile execution state (e.g., the program counter), while 
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nonvolatile state persists. For programs to make progress, they require inter- 
mittent system support to save state at checkpoints and restore the saved state 
after power failure, potentially causing re-execution from the last checkpoint. 

As EHDs aim to enable long-term deployments with little or no mainte- 
nance, intermittent systems must execute programs reliably despite frequent 
power failures and partial executions. Initial systems [35,43,24] relied only on in- 
formal notions of correctness that left them susceptible to memory consistency 
bugs caused by reading the results of partial executions [23] or by allowing sensor 
reads from past executions to remain in the nonvolatile memory [39]. More recent 
work [41,40,9,13] provides formal frameworks and correctness criteria for reason- 
ing about intermittent execution. More concretely, all intermittent executions of 
a program must be simulated by some continuously-powered execution [41]. In 
other words, intermittent execution should be idempotent. Even if the system in- 
duces multiple partial executions of a program due to power failure, the program 
should not generate a different result than it would on a single execution. 

The correctness of an intermittent execution relies on checkpointing, restor- 
ing, and finalizing state upon reaching the next checkpoint; mistakes in these op- 
erations can lead to incorrect, non-idempotent behavior. Few works have tried to 
understand the fundamental logical underpinning of these operations. This work 
fills this gap by formalizing checkpointing, crash, restoration, and re-execution 
as computation on Crash types. Crash types capture the core notion of inter- 
mittent computing: some values and computations persist across power failures 
and others do not. For instance, nonvolatile memory state persists across power 
failure and reboots, while volatile memory does not. Conversely, partially com- 
puted results do (or rather should) not persist across power failures, while com- 
pleted/checkpointed computations do. We call the former unstable values and 
computations and the latter stable values and computations. Our key insight is 
that the interactions between these stable and unstable components bear close 
resemblance to shifts in adjoint logic [8,36]. Computation of a stable value can 
only rely on locations that store stable values, while computation on unstable 
values can rely on both stable and unstable values. Moreover, checkpoint and 
restore operations can turn values of one type to the other. We define terms and 
their associated types so that each of the key intermittent computing operations 
must be well-typed under our Crash types. 

We define a core calculus for intermittent computing and develop a type sys- 
tem for Crash types by using the two adjoint modality operators. The Crash type 
of an intermittent computation is: Cunit = {(nat ~ Î Cuit) V{tunit, which says 
that the computation will either encounter a power failure (the left disjunct), 
or succeed in producing a stable value (the right disjunct). In the former case, 
the computation is suspended until energy arrives, after which it will again act 
as an intermittent computation. This recursive definition captures the multi- 
ple re-executions of a computation under repeated power failures. To prove the 
correctness of intermittent systems, we define a novel logical relation for Crash 
types, indexed by the number of power failures, which relates a continuously- 
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powered execution to an intermittent execution. While intermittent computing 
motivates our results, the methods we develop are generally applicable to other 
system failures with the same effect on persistent and transient storage. 

This paper makes the following technical contributions: 


— The first logical interpretation of key operations of intermittent execution. 

— Novel Crash types to specify how stable and unstable portions of the system 
and computation interact. 

— A core calculus for Crash types with progress and preservation. 

— A novel logical relation to prove the correctness of intermittent executions. 


Detailed proofs and definitions can be found in the extended TR [15]. 


2 Background 


We provide background on intermittent computing and detail how checkpoint 
systems work to store and restore program state to handle power failures. 
Intermittent Computing on EHDs. EHDs need intermittent system sup- 
port to save necessary state before power failure and to restore it after re- 
boot. When and where such checkpoints occur governs the intermittent exe- 
cution model under which software executes. The two prevailing intermittent 
execution models are just-in-time (JIT) checkpoints [5,4] and atomic execu- 
tion [23,24,43,37]. Under a JIT model, state is saved immediately prior to power 
failure so that execution resumes from the same point after reboot. Under an 
atomic execution model, state is saved at the beginning of an atomic region. If 
power fails before the end of the region, the system will reboot to the beginning 
of the region, re-executing until the region completes without power failure (akin 
to software transactions [38]). State-of-the-art intermittent systems use a hybrid 
“JIT + Atomics” model that defaults to JIT checkpoints except when there is 
an explicit atomic region [40,25,19]. Our core calculus follows this hybrid model. 

To ensure idempotence, an intermittent system must save the value of volatile 
state and often a portion of the nonvolatile state. To illustrate why, consider an 
execution of the simple program in Fig. 1. The program has four variables stored 
in nonvolatile memory: x, y, and z of type int and u of type bool. It consists 
of two code blocks: an atomic region declared with the Ckpt construct (lines 
1-7 on the left of Fig. 1) and a regular code block executed in JIT mode (lines 
8-14 on the right). A continuous execution of the atomic region with initial state 
x = 2, y = 0, z = 1,u = ff ends in z =2,y=1,z=1,u = tt. Now, suppose power 
fails after the execution of Line 2. Once the device recharges, the program restarts 
from the start of the atomic region. If the system does not restore y’s original 
value, this re-run computes an incorrect result: x = 2, y = 2,z =1,u= ff. Thus, 
to ensure idempotent execution, an intermittent system must checkpoint, i.e., 
save the value of, both volatile and nonvolatile memory. We next explain correct 
execution of the program in Fig. 1 for atomic and JIT modes. 


Atomic Region Execution. As EHDs are highly resource constrained, the 
system should save state judiciously; checkpointing all of nonvolatile memory is 
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1 Ckpt[al; x,z:read-only] ( 8 let w=not u in 
2 yi=ytz; 9 if w then 
3 let w= x-y in 10 x=xty; 
4 if w>0 then 11 w=ff 

5 u:=tt 12 else 

6 else 13 skip; 

7 u:=ff) ; 14 skip 


Fig. 1. An example program with an atomic region and a JIT region 


fı 2 f3 f4 y:=xrlyy és, Initial state : : : 
(0) NVo [2 0 1 ff Lz ez,ur ey t Conte Qo: = x: T i@CK, y: T i@CK, z: T i@CK, u: T b@CK 


CC  ——————— ne 


tı 5 d3 PE £2 >, L1 
(NV, [2],o0]1 | ‘4Vi{0 | ff Cunit Q4,2:= x: 7 i@RD, y™:1 i@CK, z:7 i@RD, u™:1 b@CK 
2nv.[ 2] 0] 1] ]vyf4 | ff L2 Yy2:=y:41 i@CK, u: lt b@CK 
Ci h Cunit 
rasi f 2E t, es gia a ean caren 
(3) NV, 2 0 1 if 3s nat 9? Cunit Ac: = x: T i@RD, y": 1 i@CK, z: 1? i@RD, u% : 1T b@CK 
c 
T a a a a a a ad ee ey 
Rese P2 Sey Lile 93:=x:1 i@RD, y"":T i@CK, z:1 i@RD, vw: 1 b@CK 
()NVv3|_2 | 0 | 1 | ff | V3|_0 | ff Cun  Xs:= y: dT i@RD, u: 4? b@CK 
: : es : 
©@nvf2f,ofli1]f|)]wfl1]tli: E Q!:= x: T i@RD, y™: 1 i@CK, z: 1 i@RD,4™:1 b@CK 
Sat unt Y': = y: 17 i@CK, u: LT b@CK, w:1T i@CK 
FinWorld ——————s»» = #@__ - - -- - - eo ee fen nn ee een ene en enne: 
f, #2, 0 Final state ; ; ; 
6 Nv, [2 1 1 tt yew és tUnit Oy: = x: 1 i@CK, y:T i@CK, z:1 i@CK, u: T b@CK 


Fig. 2. Intermittent execution of an atomic region. We write 7 for int and b for bool. 


expensive and unnecessary. For example, variables in an atomic region that are 
read-only (i.e., never updated) do not change value and need not be checkpointed. 
In our example, x and z are read-only, so checkpointing y and u is enough to ensure 
correct intermittent execution. Many intermittent systems follow this design of 
checkpointing all variables that are not read-only [37,19,17,26,44,12]. Given such 
asystem, Fig. 2 shows an execution of the atomic region in Fig. 1. For now, ignore 
the last two columns about typing. To save and restore state, the system follows 
redo-log semantics. It records updates to checkpointed variables in a special 
volatile region, not main memory. This region clears if power fails, throwing 
out partial updates. Upon reaching the next atomic or JIT region, the system 
commits the updates by copying them back to main memory. 


Row (0) shows initial nonvolatile locations, their values, and the mapping 
between variables and memory locations; locations £1, 2,3, and €4 in the non- 
volatile memory correspond to variables x, y, z and u, respectively. When starting 
the atomic region (Row (1)), the system takes a snapshot of £2 and 44 and stores 
it in the volatile region V;. We mark the original nonvolatile locations as check- 
pointed with the superscript ck. i.e., 0S* and ¢§*.Checkpointed locations ¢5* and 
L$ remain untouched for the remainder of the atomic region execution. Every 
access to variables y and u will instead be associated with their volatile copy £2 
and 44, e.g., the assignment in Line 2 is applied to the volatile logs of Row (2). 
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On power failure, all volatile memory clears (Row (3)), throwing out the 
log. The system shuts down until more energy is harvested, at which point the 
system regenerates the volatile copies @2 and £4 (Row (4)) and resumes execution 
from Line 2. When the execution of the atomic region is complete (Row (5)), 
the system commits the updated values of the checkpointed locations (£2 and £4) 
from volatile memory to their original nonvolatile locations (Row (6)). During 
execution, local variables are stored to volatile memory via a let construct, e.g., 
location £5 for variable w on Line 3, corresponding to a volatile execution stack. 
On power failure, the device clears all volatile memory, but such stack allocated 
locations will be recreated upon re-execution. 


JIT Region Execution. The JIT execution model prevents re-execution, so the 
intermittent system only saves and restores volatile state at checkpoints. Fig. 3 
shows the details of executing the code on the right of Fig. 1 in JIT mode. Row 
(0) shows the initial nonvolatile locations, their values, and the mapping from 
variables to locations. The system starts the JIT region by creating an empty 
context to be populated by volatile locations (Row (1)). The let construct in 
Line 8 allocates a fresh location 45 in Va and updates the mapping to associate 
variable w to 45. On a power failure in JIT mode, the system creates a nonvolatile 
copy of the volatile location £5 just before it loses the location (Row (3)). It marks 
the nonvolatile copy with the superscript ck. When restoring the program, the 
system restores these copies to volatile memory and dismisses the nonvolatile 
backups (Row (4)). The program then continues with the if clause on lines 
9-12, finally dropping the volatile location ¢;, as it is out of scope (Row (5)). 


fı #2 #3 f4 y:=xetyy ey, Initial state 


(0) NVo [3 F P t ze tau f, T Cini Qo: = x: T i@CK, y: 1? i@CK, z:T i@CK, u:T b@CK 
Start 04: = x: T i@CK, y: T i@CK, z:T i@CK, u: T b@CK 
ON [ 2 7 4 1 | tt Cunit Josa 
anw Eria i ee 0): = x:1 i@CK, y:? i@CK, z:1 i@CK, u:t b@CK 
2 siy 
Crash — l= wets ae ER he eee 
fen ¥ nat w»? Cyn Sei = X21 @CK,y :1 i@CK, z: 1 i@CK,u :t b@CK, 
ge ee ES d A E T S 
; Er L9-L12  N;:= x:1 I@CK,y :T i@CK, z:1 i@CK, u:t b@CK 
(4)NV3 | 2 1 1 tt Be f (if clause) Y,:=w:ltb 
Cunit . 
©) NV; [2 1 1 tt Final state Q,:= x: 1 i@CK, y :1 i@CK, z:T i@CK, u :T b@CK 
T Unit 


Fig. 3. Intermittent execution of a JIT region. We write 7 for int and b for bool. 


3 Key Ideas of Crash Types 


We present the intuition behind the stable and unstable memory types (Sec. 3.1), 
Crash types which internalize checkpointing, power failure/crash, restoration, re- 


Modal Crash Types for Intermittent Computing 173 


execution, and finalization of atomic regions (Sec. 3.2), and the independence 
principle applied to intermittent computing (Sec. 3.3). 


3.1 Modal Store Types 


An unstable value is an intermediate result of an execution towards a stable value 
and will be lost upon a power failure. However, if the result of a partial execution 
is committed to a nonvolatile location, it will persist and is thus stable. To 
reflect the behavior of a memory location in its type, we introduce two (adjoint) 
modalities f$ (read as “up shift from unstable to stable”) and |% (read as “down 
shift from stable to unstable”), where 1$, 7 indicates that the location stores a 
stable value of type 7 and |% 7 indicates that the location stores an intermediate 
result of an execution toward a value of type T. To fully capture how intermittent 
execution interacts with a memory location, we also annotate the type of a 
memory location with an access qualifier, RD or CK, that represents whether the 
location is read-only or checkpointed by the system, respectively. 

In our example in Fig. 2, the read-only variable x is stored in nonvolatile 
memory, so it has type x :ff_ intQRD. The checkpointed variable y has type 
y™ :fS int@CK in the nonvolatile memory, while y’s volatile copy has type 
y 3% int@ck. We use the context 2 to type nonvolatile memory and the 
context X to type volatile memory, as shown in the third columns of Figs. 2 
and 3. We drop the superscript s and subscript u from the modalities for brevity. 


3.2 Crash Types 


To capture the effects of intermittent execution in the type of expressions and 
commands, we introduce Crash types, as the notion of stable and unstable values 
is insufficient. One might expect the expression x — y to have the type {tint 
as it is a (partial) execution towards computing a stable integer value. How- 
ever, this type does not account for steps due to power failure: the crash itself, 
waiting for the device to charge, restoration, and re-execution. To reflect these 
runtime system steps at the type level, we assign the expression a type in the 
form of a disjunction |? | V |tint, where |? | is a type for computations that 
handle power failures. This type means that the expression either power fails, or 
completes its execution that evaluates to int. Next, we fill in |? | for commands 


and expressions. | ? | is a recursive type since it handles re-execution. 


Commands. The Crash type for commands is: Cuit = {(nat ~ tCunit) V 
{funit. The right disjunct states that if no power failure occurs while executing 
a command, then it computes a stable value of type unit. The left disjunct states 
that on power failure, the computation continues as a function; after receiving 
a (logical) energy input from the environment, it becomes a computation that 
yields a stable value of a command type, i.e., Cunit- This computation will execute 
after the restore, which differs for atomic and JIT modes. In an atomic region, 
the system re-executes the region from the beginning, and in a JIT region, the 
system continues with the same command that was interrupted by the failure. 
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Expressions. The definition of the Crash type for expressions depends on the 
execution mode, just as the continuation of the program after a power failure 
depends on the mode. In an atomic region, the system restores an interrupted 
run of the expression to the original command enclosed in the region, so the type 
of an atomic mode expression is Cf" = |(nat ~> tCunit) V JTA, where the left 
disjunct is the same as that of a command. On the other hand, an interrupted 
run of an expression in JIT mode will be restored to the expression itself. Hence, 
the type of a JIT mode expression is C} = |(nat ~~ TC) V JTA, where the left 
disjunct states that after power failure and reception of the energy input, the 
computation again yields a stable value of a JIT mode expression type. 


3.3 Independence Principle for Typing Intermittent Execution 


We design our typing rules to follow the rules for | and + modalities in adjoint 
logic. We introduce two judgment categories. The first category (Js) is for deriv- 
ing stable types and corresponds to the judgments of the form 21 T, meaning 
that the rules can rely only on stable locations to evaluate computation on a 
stable type. The second category (Ju) is for deriving unstable types and corre- 
sponds to the judgments of form 2; X F 7“, meaning that the rules can rely on 
both stable and unstable locations to evaluate computation on an unstable type. 
The adjoint modalities allow going back and forth between judgments J, 
and Ju, mirroring checkpointing and restoration operations. The following four 
sequent calculus rules in the underlying logic govern this back-and-forth behavior 
in our system. The rules are derivable from the more general rules in prior 
work [8,34,36]—in particular, the ¢Z* rule can be derived from a cut rule and 
4L. Typical of sequent calculus style rules, we read them bottom-up and match 
each execution step of a command with the reading of a corresponding rule. 
Next, we illustrate this matching using the execution steps in Figs. 2 and 3. 


NR; HT” +R 2,tA%; X, 41A" F T” +L" QE IR DIAD Er” 
RETT” NR, TA“; DE T“ MDE FITr R; X, JTA" F r" 


4L 


Shifts in Atomic Mode (Fig. 2): A combination of +R and two tL* rules 
corresponds to creating a volatile log from the nonvolatile locations when starting 
the atomic region, i.e., the step from Row (0) to Row (1). The last two columns in 
Row (0) correspond to the conclusion of a ÎR rule: 2) F T Cunit. An application 
of TR from bottom to top drops the ¢ modality from the type of the program and 
opens an empty volatile region, i.e., 29;- Cunit. Next, one application of *L*, 
copies the variable y of type Tint to the volatile memory with the type | tint. 
Similarly, the next application of +Z* copies the variable u of type + bool to 
the volatile memory with the type | +bool. The same combination corresponds 
to creating a volatile log from a nonvolatile location when restarting the atomic 
region, i.e., the step from Row (3) to Row (4), again copying variables y and u 
to the volatile memory. 

The |R rule corresponds to a power failure, which erases the volatile memory 
X. From Row (2) to Row (3) in Fig. 2, the system loses the volatile locations of y 
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and u and closes off the volatile context. Row (2) corresponds to the conclusion 
of the rule, and Row (3) corresponds to its premise. The type of the command in 
Row (2) changes from Cuit to (nat ~> TCunit) (by another V-R rule as a crash 
is detected), and then to the type (nat ~> tCunit) in Row (3). 

Finally, a |Z rule combined with a standard weakening rule and a |R rule 
corresponds to the final commit of the volatile context, i.e., stepping from Row 
(5) to Row (6), the nonvolatile context drops the locations y and u of types 
Tint and bool, respectively, by a weakening rule. These two variables map to 
the locations with outdated values. Next, the volatile locations of y and u in 
X', which contain the up-to-date values, commit their values to the nonvolatile 
context by a |Z rule. Then, a |.R rule closes off the remaining volatile context, 
which contains w of type | tint. The type of the command in Row (2) changes 
from Cunit to Jfunit (by a separate V-R rule as the system detects a successful 
execution) and from that to type tint in Row (6). 


Shifts in JIT Mode (Fig. 3): A {R rule corresponds to creating an empty 
volatile context Xı when starting the JIT region, i.e., the step from Row (0) 
to Row (1). A combination of the |Z rule and |R rule corresponds to a power 
failure, i.e., the stepping from Row (2) to Row (3). A |Z rule copies the location 
w of type | Î bool from volatile memory X% to nonvolatile memory Ne. A JR rule 
closes off the (empty) nonvolatile memory. As in atomic mode, a combination 
of TR and tL* rules corresponds to creating a volatile log from a nonvolatile 
location when restarting the command after the failure, i.e., the step from Row 
(3) to Row (4). The +R rule clears a portion of volatile memory, and the +L* 
rule copies variable w from nonvolatile memory into volatile memory. We need 
an extra weakening rule to eliminate the remaining variable w in nonvolatile 
memory. The dropping of volatile memory at the end of execution (Row (5)) is 
not a modal step, but rather follows from a standard rule for the let clause. 


4 A Basic Calculus for Intermittent Execution 


We present the syntax, semantics, and the Crash type system for a basic calculus. 


4.1 Syntax 


The syntactic constructs are summarized in Fig. 4. Expressions include con- 
stants, variables, and binary operations while commands include assignments, 
mutable let bindings, sequencing, and if branching. A program consists of se- 
quenced blocks of commands and atomic regions, denoted Ckpt[alD, p](c) with a 
unique identifier alD, read-only variables p, and the enclosed command c. 

Nonvolatile memory (NV) and volatile memory (V) map locations £ to values. 
Each location is annotated with its access mode q (RD or CK). The nonvolatile 
memory location fex is the checkpointed copy of location £ in volatile memory. 
The context y maps variable names to memory locations. Access mode qualifiers 
in V and NV have constrained values (to be discussed in the semantics). 
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Command, expression, and memory 


values v ::= n | tt | ff | £ access qualifier q ::= CK | RD 

exprs e :=v|e0e var loc map y =- |y= £l 

cmds c ::= skip | letz = einc | c;c nonvolatile mem NV ::= - | L@q => v, NV 
| if ethen celsec | z ::= e | lex @CK > v, NV 

progs p ::= Ckpt[alD, p] (c);p | c;p | skip volatile mem V x=. |L@CK=>v,V 

Instructions, statements, and configurations. 

commands c :=:::Gwe crash instrs i ::= Le # in(b > 0, Tx) 

continuations k ::=c |e | e#in(b>0,tK) |t& 

statements s :=K|t|p open config Ko ::= (y|Mad|g|NV|V|s) 

energy level g u=-|n | (y|Ma]|g]| NY |s) 

charge stream x :=n:: X closed config Ke ::= [x > €] 8 Ko 


ni 
exec. mode Md ::= alD(c) | jit 


Fig. 4. Summary of syntax 


The runtime instruction c1;w C2 is used for evaluating cı under the execu- 
tion context W. To model energy harvesting from the environment, we assume a 
unique external energy channel, £, from which the system receives energy. Three 
crash instructions control the system in the event of a power failure. The instruc- 
tion {e # in(b > 0,t«) models the system that faces a power failure, where « is 
the interrupted command or expression, and b > 0 is a guard to ensure that the 
bound incoming energy variable b is positive. The instruction £ # in(b > 0, x) 
models the system awaiting an energy input to be bound to b. The instruction 
Tk models the system ready to restore memory and re-execute. 

We write K, to denote an open system configuration, consisting of the map- 
ping y, the mode of execution Md (i.e., atomic or JIT), energy available for this 
execution g, memories, and the statement s to be executed. The energy level (-) 
models the state right after power failure. We close an open configuration with 
[x > e]; we connect it via an external energy channel £ to an infinite charging 
stream © of natural numbers, which models available energy the configuration 
harvests from the environment at each power failure point for re-execution. 

We call a configuration that cannot take a step a value configuration (value 
for short). An open configuration of form (--- | g | +++ | s) is a value, i.e., 
Val(--- | g |---| s), if either s is a constant or skip, it has depleted all energy for 
this execution (g =0), or s is a crash instruction. The latter two cases are values 
because they cannot take a step without interacting with the environment or 
perform operations on the volatile and novolatile memory specific to handling 
power failures. A closed configuration is a value only if the statement s is skip 
with some energy left (g > 0). We list all values in the extended TR [15]. 


4.2 Operational Semantics 


Top-level Program Execution. The top-level semantic rules for setting up 
and finalizing the atomic and JIT execution contexts are shown in Fig. 5. The 
P-CKPT rule applies if the next code block is an atomic region. The nonvolatile 
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n>0 InitWorlda(NV; p; y) = NVo, Vo 
[x > £] 8 y| alD(co) |n | NVo |Vo| co >* [x > £] @ 7’ | alD(co) | n| NV'| V'| skip 
n’>0 NVi = FinWorlda(NV’; V’) 


(P-CKPT) 
[x> E] 87 |n | NV | Ckpt[(alD; p)](co);p > [x > e] 87 |n | NV: |p 
n>0 n'>0 
x> e] g7 l|jit|n | NV |- |e >* [x >] @ y |jit| n | NV’ |V | skip 
(P-SEQ) 


Ixbel@y|n|NV| op => be @y|n'| NV’ |p 


Fig. 5. Closed configuration semantics for programs 


NVo and volatile Vo locations are initialized based on a given NV, declared read- 
only variables p, and their mapping y to locations. The InitWorldg function (a) 
changes the qualifier of locations in NV that are declared as read-only in p from 
CK to RD, (b) creates Vo by copying the rest of the locations of NV that still have 
qualifier CK, and (c) marks the original version of the locations £ in NV that 
still have qualifier CK as checkpointed (Zex). This part corresponds to the step 
from Row (0) to Row (1) in Fig. 2. The closed configuration of co is evaluated 
until completion, using the rules in Fig. 6. This execution may undergo several 
power failures and corresponds to the steps from Row (1) to Row (5) in Fig. 2. 
Finally, the FinWorldg function closes off atomic regions, finalizing the volatile 
and nonvolatile locations. FinWorldg (a) copies the values of volatile locations in 
V’ that have a checkpointed version into NV’, (b) removes CK from the locations 
in NV’, i.e., converts lex to £, and (c) replaces the RD qualifier of the locations in 
NV’ with CK. This corresponds to the step from Row (5) to Row (6) in Fig. 2. 


The P-SEQ rule applies when the next code block is a regular command c. 
The closed configuration of c with an empty initial set of volatile locations is 
fully evaluated. This corresponds to the steps from Row (0) to Row (1) and Row 
(1) to Row (5) in Fig. 3. Then the resulting volatile locations V’ scoped in c are 
dropped, corresponding to the step from Row (5) to Row (6) in Fig. 3. 


Command Execution (Closed Config). We summarize rules for a closed 
configuration in the top part of Fig. 6. Rule D-STEP steps the closed command 
configuration when the corresponding open configuration steps. Next, we explain 
the trio of power failure, charge, and restore rules. When the energy for this 
execution is depleted (i.e., g = 0), the D-CRASH rule applies, stepping the system 
to the crash instruction e # in(b > 0;t«). Next, D-S-JiT or D-S-AID rules apply 
and operate on volatile memory based on the execution mode Md. In JIT mode, 
D-S-JiT checkpoints and stores all volatile memory in nonvolatile locations. In 
atomic mode, D-S-AID drops all volatile memory locations. Then, D-CHARGE 
applies and inputs a natural number n > 0 from the energy channel, replenishing 
the configuration’s energy level for re-execution. Finally, the program is restored 
via D-RESTORE-JIT and D-RESTORE-AID which copy checkpointed locations 
into volatile memory. D-RESTORE-JIT drops the checkpointed regions and steps 
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Closed Configuration Semantics for Commands and Crash Instructions 


y|Md|n|NV|Vle> y|Maj n | NV |V |e 
[x> e] @y|Ma|n|NV|Vic> [x> e] 87|Ma| n| NV |V |c 


7 (D-STEP) 


- (D-CRaASH) 
Ix > e] @y7|Md|O|NV|V| c= [x> ec] @y|Md| - |NV|V| le #in(b > 0; tc) 
Md = jit 

[x> e]@y|Md| - |NV|V| le # in(b > 0; t«) 
=> [x >be] @y|Md|NV, Vx |e # in(b > 0; TK) 


(D-S-JivT) 


Md =alD(co) y Cy range(y') = dom(NV) 
Ix > e] @7|Md| - |NV|V|Jle # in(b > 0; tx) 
=> [xb e] @7'|Mda| - |NV|e #in(b > 0; th) 


(D-S-aID) 


D-CHARGE) 


In: x be] @y|Mdl - INVie#ine S01) = heey MalniNVI ta 


NV = NV’, NV% 
REA o7ljitln|NV| te > xe e]&7ljit|n|NV INV |x ` 


D-RESTORE-JIT) 


NV = NV’, NV, 
[x > e] @ yl alD(co)|n|NV| te > [xb £] 8 y| alD(co) |n | NV | NV” | co 


(D-RESTORE-AID) 


Selected expression and command semantics 


y=% le = 4 V=l@qou,V n=n'+1 
y|Maj|n|NV|V|z—>y|Maj|n | NVI V |v 


(D-V-READ) 


Val(y | Mad |n|NV|V |e) 
V=Vi l@qouv q#ARD y=7, |r > n=n' +1 
y|Ma|n|NV|V]|z:=e > y|Md|n’ | NV |V’, @q = e | skip 


(D-AssIGn-V) 


Fig. 6. Statement steps 


to the interrupted command «, while D-RESTORE-AID keeps the checkpointed 
regions and steps to the original command co in the atomic region. 
Command/Expression Execution (Open Config). The rules for executing 
commands and expressions in an open configuration are standard. We present 
a selection of them on the bottom of Fig. 6. Each step decrements the energy 
level by one. The rules ensure that checkpointed location ex in NV is not read 
by the program, as it could store outdated data, and is not written to, as this 
would tamper with the checkpointed value. 


4.3 Types, Typing Contexts, and Judgments 


This section introduces the typing judgments used in our static typing. 
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(Ju Md | bRO:nat | RQ; XF c :: Cunit c could crash 

(Ju Md | b: nat | Q; X F skip :: [funit c will not crash 
(Js) Md | b:nat | QF skip :: funit after commit 

(Ju Md | bRO:nat | Q; X Fr e : cy e read, could crash 
(Js) Md | b:nat | Q; X Fr v :: JTA e read no crash 
(Js) Md | b: nat | R Frv : TA e read, commit 

(Ju Md | b: nat | Q; X wea: JTA write on x, no crash 
(Js) Md | b:nat | Qhywra:tA write on z, commit 
(Js) Md | b:nat | QE p :: tCunit before execution 
(Ju Md |b=O:nat | QXF r: CHE about to crash 

(Ju) Ma |-| RQ; XF Le #in(b > 0, th) 2: {mat ~ TCP) crash state 

(Js) Md |-| Qre#in(b>0,tK) : nat ~~ TCP waiting for energy 
(Js) Md | b>O:nat | RQF Tr TCF before re-execution 


Table 1. Typing judgment summary 


Types and Static Context. Our types are summarized below. The two modal- 
ities stratify types into the varieties stable (7*) and unstable (7“). The base store 
types int and bool are considered unstable. A type variable v, denotes a type 
in the set {Cunit, CH", di }, and implements the recursive nature of Crash types. 
We include the connectives V and ~ solely for the purpose of defining Crash 
types; they are not used elsewhere. Defining Crash types using these connec- 
tives will allow us to define the logical relation in Sec. 5 based on the intended 
meaning of its index type. Some well-formed types, e.g., nat ~ nat ~~ funit, 
are not accepted by our type system introduced in Sec. 4.4. These types have 


no inhabitants, i.e., no well-typed configuration is of these types. 


store types A:= int | bool stable types T° := nat ~~ T° |f T” 


basic types T := unit | A unstable types T” := T |} T° | T” V T” | v 


Volatile store typing context X := -| x: J515 AQCk, X 
Nonvolatile store typing context Q := - | x : ti AQRd, 2 | £ex : TF AQCK, 2 
| x : 1$ AQCK, 2 


A nonvolatile store typing context (2 assigns stable types to nonvolatile lo- 
cation variables, i.e. all variables in 2 have a type of the form f$ A. A volatile 
store typing context X assigns unstable types to volatile location variables, i.e., 
variables in X are of the type J$ùf$ A. Zex refers to a location that has been 
checkpointed. In the atomic mode, £ex has an active volatile log in X. 


Typing Judgments. Table 1 summarizes all the typing judgments. These judg- 
ments are parameterized over the execution mode Md of the expression or com- 
mand to be typed. The judgment also tracks a variable b corresponding to the 
current energy level of this execution. b ranges over natural numbers (nat) and 
is constrained by a relation R € {>,>} or is set to 0; where b > 0 is uncon- 
strained. The constraint on b determines whether or not a command can evaluate 
a value without power failure. There are three judgments for command typing. 
The first judgment is used when the command has not yet successfully finished 
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jit | b > 0: nat | 2; - Fg c: Cunit b: nat | QE p: tCunit 
b : nat | Q F ep: tCunit 


(T-P-sEQ) 


Qo | Xo = InitWorld: (2; p) 
Sig = {alD(co) | b > 0 : nat | 2; Xo F co : Cunit} 
alD(co) | b > 0: nat | 20; Xo Fsig co: Cunit b : nat | QE p: TCuit 
b : nat | Qt Ckpt[alD, p] (co); p : TCunit 


(T-P-CKPT) 


Fig. 7. Program typing 


executing; its next step, depending on its constraint R, may or may not crash. 
When the command reaches type |tunit, b no longer needs to be constrained 
as the execution succeeded without power failure. The second judgment invokes 
the third judgment to type the configuration after the volatile log is committed: 
in the typing rule for committing the volatile log, the conclusion is of the form of 
the second judgment and the premise is of the form of the third. For expression 
typing, we distinguish expressions on the right of an assignment (being read) 
from those on the left of an assignment (being written to) via subscripts RD and 
WT, respectively. The expressions that are being written to are only of the sim- 
ple form x. As no execution is required to evaluate x, we consider its judgment 
crash free, so no constraint is required on b. For program typing, we only have 
one judgment that refers to the type of the program before the execution of its 
next block starts. The rest of the judgments type states after a crash. The first 
judgment uses the constraint b = 0, which corresponds to the power failure con- 
dition. It invokes the second judgment, which types a state right after crash. The 
third judgment types the state awaiting energy to continue re-execution, and the 
final judgment types the state that is ready for restoration and re-execution. 


4.4 Typing Rules 


Program Typing. Fig. 7 shows the typing rules for programs. The P-SEQ rule 
types program c;p by first typing c under jit mode, requiring b > 0, and then 
typing the rest of the program. The volatile memory context is empty for now, 
but will be populated when the let commands allocate new volatile locations. 
The P-CKPT rule types the command co enclosed in an atomic region under 
the mode alD(co) and then types the rest of the program p. The first premise 
sets up the initial typing contexts for nonvolatile and volatile memories, as illus- 
trated in Fig. 2. The partial function InitWorld;, initializes the volatile memory 
by creating a log of variables in 2 that are not read-only. 2 can be uniquely 
split into 2° and 22", where 2” is the set of all read-only locations in 2, and 2° 
is the set of all locations that are not read-only. This function is defined below: 
No | Xo = InitWorld: (2; p) iff p C dom(2), Ro = 2", RS, and Xo = LN 
where 22 = R, Q" and R” = Qf p. 
Here N” = Np is a subset of N where locations are declared in p to be 
read-only, and 2° are all other locations in 2. The context 25,, is defined as 
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25, = {uy :tA@q| x:tA@q E N°}, and the context |Q°, is defined as 
{O° = {a: |tAQ@q| x: tA@g E 2°}. If the set of read only variables, p, is not in 
the domain of 2, then the function InitWorld; is not defined. 

In rules P-SEQ and P-CKPT, the command typing judgment in the premise 
makes use of a signature (subscripts Ø and Sig, respectively) to type check 
the command relative to the signature. The signature is populated at different 
stages of type checking the JIT and atomic regions. In an atomic region, rule 
T-P-CKPT populates the signature at the beginning of the region with the initial 
judgment which includes the region’s original command cp and static memory 
context 29; Xo. The region is then typed relative to the signature. In JIT mode, 
the signature is populated later with the judgment just at the point of the failure 
(rule T-ENOUGH?). The program remembers that it built a typing derivation for 
the judgment in the signature such that when it restores from a power failure, it 
refers to the signature and checks that the restored judgment matches the one 
stored in the signature without needing to derive it again. This makes the typing 
derivations finitary and inductive. 


Command and Expression Typing. Fig. 8 shows selected typing rules for 
commands. The T-SkIP rule declares the command skip as the stable type funit. 
Rule T-V-Succ applies when the command successfully completes its execution 
and still has one unit of energy available (b > 0) to conclude the execution. In 
this case, we close off the energy level variable and continue typing the com- 
mand against the type {* unit. Rule T-C-SsHIFT is invoked by T-V-Succ and 
updates the memory typing contexts by removing checkpointed locations in 92 
as now they are not needed, and making locations in X stable as now they are 
committed. This corresponds to the last step of Fig. 2. 

The rules T-LET and T-ASSIGN, are mostly standard except that we consider 
crashes. For example, in typing the assign command x := e, the first premise 
of T-ASSIGN considers the type of expression e to be the Crash type C'%?, but 
in the second premise we require the location x to be of type {tA, i.e., the 
location only considers the type corresponding to the case where execution of e 
can be completed successfully. The reason is that the assignment only occurs if 
the execution of e is successful. The constraint on the energy levels for premises 
goes back to b > 0, as we use one energy unit to deconstruct these commands. 

The rule T-ENOUGH? checks two premises based on the value of b > 0. The 
third premise, a crash judgment, corresponds to the case where b = 0 (typing 
rules for crash judgments are given later in this section) and the fourth premise 
corresponds to the case where b > 0. The condition b > 0 states that there is at 
least one unit of energy available to decompose one command construct, e.g., via 
T-LET or T-ASSIGN. This rule populates the signature for JIT commands. The 
second premise states that the signature remains intact if the mode is atomic, but 
is populated by Sig’ if the mode is JIT. In the JIT mode, after a power failure, 
the command c is restored to itself, and Sig’ remembers that the well-typedness 
of the command when the energy level is non-negative has been checked already. 

Expression typing rules are very similar to those of the commands. Fig. 8 
shows a few selected rules. The T-LOC-WRITE and T-Loc-READ rules match 
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Commands 


g (T-SKIP) 
Md | b : nat | Q Fsig skip : Tunit 


S=] R=, NH Ma|b:nat | 2,5" sig skip : tunit 
Md |b: nat | Q; X Fsig skip : Junit 


(T-C-SHIFT) 


Md | b : nat | Q; X sig skip : |tunit 
Md | b > 0: nat | 2; X Fsig skip : 7 V |tunit 


(T-V-Succ) 


Md | b > 0: nat | 2; X Frosig e1 : CA 
Md | b > 0: nat | 2; X, x: JT AQCK Fsig € : T 
Md | b > 0: nat | 2; X Fsig letz = e1 inc: 7 


(T-LET) 


Md | b > 0: nat | 2; X Hro;sig e : CA Md | b > 0: nat | 2; X Fur x : JPA 
Md | b > 0: nat | 2; X sige := e : Cunit 


(T-ASSIGN) 


Sig’ = {Mda |b > 0: nat | Q; XF ce:rT} 
Sig” = if Md = jit, then Sig’, else Sig 
Md | b= 0: nat | 2; E Fsg e:r Md|b>O:nat | 2; X Fsige:T 
Md | b > 0: nat | Q; X Fsig C: T 


(T-ENOUGH?) 


Expressions 


2, 3 = xt A@q, Na qA#RD 
Md | b : nat | 2, X’ Fw x : tA 


(T-Loc-WRITE) 


Q=2x2:tAQ@q, 2 
(T-Loc-READ) (T-BOoL-T) 
Md | b: nat | 2 Fro z : ÇA Md | b : nat | 2 Fr tt :f bool 


Fig. 8. Selected command and expression typing 


the location variable x with an existing variable inside the context. T-LOC-WRITE 
performs an extra check to make sure that x is not a read-only variable. 


Statement typing Fig. 9 presents the typing rules for crash instructions. The 
crash is detected by the depleted energy level b = 0 in the T-V-CRASH rule. In 
the premise, the crash instruction Je # in(b > 0,t«’) is typed. In JIT mode, 
the T-JIT-STOP rule brings a checkpointed version of all the volatile variables 
in X inside 2 since they are checkpointed then. In atomic mode, T-AID-STOP 
rule simply drops the volatile locations in X. The T-CHARGE rule inputs a new 
energy level from the energy channel £, regardless of the mode. The first premise 
shows that the energy channel is needed to provide a natural number greater 
than zero. Finally, the T-JIT-RESTORE and 'T-AID-RESTORE rules prepare and 
check rebooted system in JIT and atomic modes, respectively. In both modes, 
volatile memory is restored from the checkpointed locations in 92. In the atomic 
mode, the checkpointed locations persist in (2 as we may need them for the 
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Md |- | Q; X Fsig JE #in(b > 0, th’) : L(nat ~ TCT) 
Mda |b =0: nat | 2; X Fsig K' : [(nat ~ ter) VIJIT 


(T-V-CRASH) 


X=" jit |- | 2, tZ Fsig € #in(b > 0, 4%’) : (nat ~> TCF) 
jit |- | Q; X bsig Le #in(b > 0, th’) : [nat ~ + Cr) 


(T-JIT-STOP) 


alD(co) | > | 2 Fsig € # in(b > 0, fK’) : (nat ~> tcini) 
alD(co) |> | 2; X Fsig Je # in( > 0, fr’) : [(nat ~> f Ciare) 


(T-aID-sTop) 


e#in():nat >0 Md| b>O:nat| 2 Fsigt k's TCF 
Md | -| Qhsig € #in(b > 0,t K’) : (nat ~ TCr)) 


(T-CHARGE) 


R=, NQ jit] b>O:nat | Z; LQ" bk’ :Cr € Sig 
jit | b >O:nat | N Fsig th’ st Cr 


(T-J1T-RESTORE) 


R = 2,2, aID(co)| b > 0:nat | N: LR” F co : Cuit € Sig 
alD(co) | b > 0: nat | R Fsig Th’ :t Cunit 


(T-AID-RESTORE) 


Fig. 9. Crash, restore, and checkpoint typing 


next power failure. Alternatively, in the JIT mode, checkpoints are dropped 
from (2 and execution continues with the expression or command «, which was 
running right before the crash. In the atomic mode, execution continues with 
the original command cp enclosed in the atomic region. Instead of retyping the 
restored judgments, we check if there are already typing derivations by matching 
them up with the saved judgment in the signature. 


5 Logical Relation for Intermittent Execution 


We establish a logical relation to prove idempotency, which states that every 
intermittent execution of a program can be simulated by a continuous execu- 
tion. The logical relation relates an intermittent execution with a continuous 
one and is indexed by Crash types. A continuous run is one with an infinite en- 
ergy level, oo. Crash types are recursive, yielding possible infinite atomic region 
re-executions. Thus, we use the maximum number of executions (also power fail- 
ures) as a step index to stratify our logical relation to ensure its well-foundedness. 

The logical relation (defined in Sec. 5.1) relies on PwOff, Restore, and Commit 
functions, referred to as power failure, restore, and commit policies, respec- 
tively. We establish specific policies for atomic and JIT execution modes. We 
formalize semantic typing as every atomic and JIT region of the program being 
logically-related to themselves. We prove that the semantically well-typed pro- 
grams are idempotent across power failures in Sec. 5.2. The definitions match 
the memory operations in the dynamic rules that deal with crash, restore, 
and re-execution (D-S-aAID/ D-S-Jir, D-R-aAID/ D-R-Jit, and D-P-CKPT/ 
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Md | b > 0: nat | Q | XIF c1 < co: Cunit 
iff Yn, m > 0. Yy, NV, V.s.t.NV| VIF y: 2| X. 
(y |Ma |n | NV |V |c, |Ma | œœ | NV |V | c2) E€ EfCunit]” 


Term Relation 


E[Cunit tt = {(91 | Md | ni | NVi | Vi | cr, 72 | Md | o0 | NV2 | V2 | ce) s.t. 
(71 | Ma’ | n3 | NV3 | Vi | ci) s-t. 

1 |Ma | ni | NVi | Vi | c1 >ğrea Vi | Ma’ | ni | NV | Vi | c A 
“(fs | Na! | 00 | NV} | Vb | 68) s-t. 
72 | Md | o0 | NV2 | V2 | c2 >* y2 | Md’ | co | NV3 | V5 | c3 A 

(a1 | Mal | ni | NVi | Vi | c1, y3 | Ma’ | 00 | NV3 | V2 | ch) E€ V [Cunt] *"} 


Canit]? = {(71 | Ma | nı | NV | Vi | c1, 92 | Ma | 00 | NV2 | V2 | c2)} 


wa 


E 
Value Relation 


V[tunit]” = {(y|Md| nı | NV: | skip, y | Ma | oo | NV2 | skip) s.t.NV; = NV2} 
V| unit] = {(y1 | Md| nı | NVi | V1 | skip, y2 | Md | 00 | NV2 | V2 | skip) s.t. 
Commit (y; |Ma | NV; | Vi) = 71 | NV; A 
(y4 |Ma |n: | NV; | skip, y2 | MA | o0 | NV | skip) € V[-tunit]} 
V [fCat] = {(y1 |Ma | n | NV1 | Tk, y2 | Md | oo | NV2 | V2 | c2) s.t. 


restore(71,Md, NV1, £) = NVo | Vo | co ^ 
(71 |Ma | n | NVo | Vo | co, y2 | Ma | 00 | NV2 | V2 | c2) € E [Cunit] } 


Yinat~>tCunit]™ = {(71 |Ma | - | NV1 |£ #in(n > 0,1%), y2 | Md | co | NV2|V2|c2) s.t. 
Yn>0.(71 | Md | n |NV1]| TK, y2 | Md | o0 |NV2|V2| c2) E VIT Cunit] } 


V| (nat Cust)” = {(7 | Ma] - |[NVi|Vi| Le # in(n > 0,1%), y2 | Md | co |NV2|V2| c2) 

s.t. PwOff (y1, Md, NV1, V1) = y1 | V'A 

(yi |Ma] - |V’, NV1 |e #in(n > 0,f%), y2 |Ma | o0 | NV2 | V2 | c2) 
€ Vimat ~ Cuit” } 


V [Cuni] = { (71 |Md | n1 | NV1 | Vi] c1, y2 |Md | co | NV2 | V2 | c2) 
s.t. either 
ni =0 A (m | Md | : | NVı [Ville # in(nı > 0, ter), 
2 |Md|co|NV2|V2|c2) E VIL (mat ~f Curie), or 
nı > 0 A (y1 |Md| ni | NVi | V1 | c1, y2 | Md| co | NV2 | V2 | c2) 
€ VT unit]™”} 


Fig. 10. Logical relation 


D-P-SEQ) for atomic and JIT regions, We prove that our syntactically well-typed 
programs are semantically well-typed. We generalize semantic typing rules, al- 
lowing custom power failure, restore, and commit policies (Sec. 5.3). 


5.1 Semantic Typing via a Logical Relation 


The logical relation, written Md | b > 0 : nat | 2 | XIF c1 < co: Cunit, is defined 
in Fig. 10 by a lexicographic induction on the index m and the structure of the 
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types. The judgment NV | VIF y :: 2 | X in the definition states that y maps the 
variables in X and {2 to locations in V and NV resp., such that their qualifiers 
and types match. Similar to prior work [2,16,42], our definition consists of a term 
relation E[Cunit]’”” and a value relation V[7]”. 


Term Relation. A pair of open command configurations of type Cunit are in 
the term relation of index m if any intermittent execution of the first one after 
m power failures is indistinguishable from a continuous execution of the second 
one. In particular, for index m+1, the term relation relates two configurations at 
type Cunit if the first configuration eventually steps to a value (or “irreducible” ) 
configuration, i.e., it either evaluates to skip or its energy level depletes (n4 = 0), 
and the second configuration can take zero or more steps such that the pair con- 
tinue to be in the value relation of V[Cunit]’"t'. When the index is m = 0, 
no execution is observed, so any two configurations are in the term relation. 
Here, irred refers to yi | Md’ | n4 | NV|V{.|¢, being an irreducible configuration, 
i.e. it cannot take any more steps. Since our semantics for commands is deter- 
ministic, for each configuration 7 |Ma | nı | NV1 | V1 | cı there is exactly one such 
irreducible configuration. 


Value Relation. The value relation is defined based on the intended meaning 
of the type, and relates two value configurations that will have the same effect 
on the stores. The value relation relates two open command configurations at 
type Cunit and index m + 1 if either (a) the first configuration has faced a power 
failure, and the two configurations continue to relate by V[] (nat ~ TCunit)]’”, 
or (b) the first configuration executed successfully without any power failures, 
and the two configurations are related by V[|tunit]’. This definition matches 
the disjunctive nature of type Cunit, which is recursively defined in the signature 
as |(nat ~> tCunit) V Junit. Since we unfold the recursive definition of Cuit, 
we decrease the index from m-+1 to m to ensure the relation’s well-foundedness. 
Note that the value relation is neither defined nor called for Cunit at index 0. 


The value relations in the third, fourth, and fifth rows of Fig. 10 are defined 
based on the type of the first configuration; the second configurations in these 
relations continue to be of type Cunit. Only in the relations defined in the first 
and second rows of Fig. 10 do the types of both configurations match the indexed 
type of the relation. Hence, the value relation has varying arity: in the first and 
second rows of Fig. 10, the relation is binary while in the rest, the relation 
degenerates to unary, with the second configuration as its Kripke world [18]. 


The value relation at type |(nat ~> tCunit) relates two configurations if the 
first one runs the crash instruction Je # in(n > 0,t«) and a power failure policy 
creates a checkpoint of volatile locations such that the configurations continue 
to be in the value relation at type (nat ~> *Cunit). The power failure function 
in an atomic mode is defined to checkpoint none of the volatile locations, i.e., 
PwOff(y, alD(co), NVi, Vi) = 7 | Ø, where 7’ is the largest restriction of y with 
range(y') = dom(NV,), and defined to checkpoint all volatile locations in JIT 
mode, i.e., PwOff(7, jit, NV1, V1) = 7 | Vi. 
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The value relation at type (nat ~> {Cynit) is defined similarly to a function 
type in a value relation and requires the configurations to be related at type 
(tCunit) for every energy input level n provided to the first configuration. 

The value relation at type tCunit requires the first configuration to run the 
crash instruction tk. The defined restore policy restores the nonvolatile memory 
NVo, volatile memory Vo, and re-execution command cp such that the config- 
urations continue to be related in the term interpretation at type Cunit. In an 
atomic mode, the restore function is defined as restore(y, alD(c),NVi,«) = 
NV; | NV” | c where NV; = NV’,NV{,. In the JIT mode, the restore function 
is defined as restore(vy, jit(c), NVi,«) = NV} | NV” | c where NV; = NV’, NV‘... 
We write NV; = NV’, NV‘, to state that NV; can be uniquely partitioned into 
all locations (NV/!,) that are checkpointed, i.e., of the form lex, and regular lo- 
cations (NV’) of the form ¢. NV” is the non-checkpointed version of NV, which 
could be retrieved by removing the ck subscript from every location in NV‘. 

The value relation at type {funit requires both configurations to run skip, 
and the defined commit policy creates nonvolatile memories for both runs such 
that they continue to be related at type Tunit. In an atomic mode, the commit 
function is defined to replace the checkpointed locations in the nonvolatile mem- 
ory with their volatile log, i.e., Commit(y | alD(co) | NVi | Vi) = 7 | NV4 | V”, 
where NV; = NV}, NV% and V; = Vi, V” and dom(V”) = dom(NV”). Moreover, 
y! C y, with range(y’) = dom(NV;) Udom(V”). In the JIT mode, the commit 
function simply drops all volatile memory, i.e., Commit(y | jit | NVi | Vi) = 7 | 
NVi, 7 C y, with range(7’) = dom(NV}). 

The value relation at type funit requires the successful executions to store 
the same values in their memories, i.e., NV; = NV2. 

Semantic Typing. A program is semantically well-typed if every JIT and 
atomic region of it is self-related under our logical relation. 


jit | b>O:nat|Q;-lke<ec:Cuit b: nat | Qik p: tCunit 
b:nat | 2 IF cop: tCunit 


(P-SEQ-SEMANTIC) 


Qo | Xo = InitWorld: (2; p) 
alD(co) |b > 0 : nat | R0; Xo IF co < co : Cuit b : nat | 2 IF p: tCunit 


b : nat | Q I- Ckpt[alD, p] (co); p : TCunit 


(P-CKPT-SEMANTIC) 


5.2 Semantic Typing for Idempotency 


The fundamental theorem of our logical relation states that syntactically well- 
typed programs are also semantically well-typed by proving that syntactically 
well-typed JIT and atomic regions are self-related. We state and prove the theo- 
rem in Sec. 6 but devote this section to explaining why being self-related implies 
idempotency. We explain it separately for JIT and atomic blocks. 

Stepping a JIT block. Consider a program of form |x1>E]8971 | n | NVi | c1; p 
that can take a step to [xx > €] 8 7 | n}, | NV} | p via the D-P-SEQ rule. By 
the D-P-SEQ rule, we know that the command cı is successfully executed to 
completion with possibly m-many power failures along the way: [X1 > €] 8 71 | 
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jit|n | NVi|- loa >* [ve > e] @ y, | jit | ni | NV} | V}, | skip. Our goal is to 
simulate this execution in a continuous setting. To model a continuous run, we 
run the configuration with oo, an energy level: [x > £] & 71 | jit | co | NVi |-|] 
c& =>* [xb e] 8 7 | jit | co | NV; | Vi; | skip. 

Fig. 11 shows the construction of the simulation. We start with the assump- 
tion that the configuration with n energy level is self-related when given energy 
level oo for every index, including m + 1 (point (1) in Fig. 11). We show that 
if the first configuration takes one or more steps, the second configuration can 
take zero or more steps so that the intermediate regions continue to relate. 


By definition of the term interpretation, cı in the first configuration is ex- 
ecuted until the first power failure occurs. Moreover, by the relation, we can 
execute cı in the second configuration, too, such that the resulting configura- 
tions remain related (point (2) in Fig. 11) by the value interpretation at type 
Cunit. The first configuration takes a step from point (2) to point (3) using the 
D-CRASH rule by the computational semantics. By the definition of the logical 
relation, the two configurations continue to be related by the value interpretation 
at type {(nat ~f Cunit). Then the first configuration takes a step from point (3) 
to point (4) by the D-S-JIT rule; in this case, we know (by the assumptions of 
the rule) V’ = V{ and yf = y. This matches the definition of the power-off policy 
for JIT blocks (see Sec. 5.1), and thus the two configurations remain related by 
the value relation at type nat ~7 Cunit. Next, the first configuration takes a 
step to point (5) by inputting a new energy level from the environment (n2). By 
the definition of the value relations, the two configurations will remain related 
by the value interpretation at type + Cunit- 


Finally, the configuration steps to point (6) by D-RESTORE-JIT that copies 
all checkpointed locations inside the volatile memory and continues by running 
the interrupted command k, i.e., here NVo = NV/ and Vo = V’ = V, and co = k. 
This matches the restore policy defined for JIT regions; thus, the configurations 
continue to be related by the term relation at type Cunit, similar to what we had 
earlier at point (1) in Fig. 11, but with fewer power failures remaining. 


Now, when the first configuration finally steps to point (8), by the definition 
of the logical relation, we know that the second configuration steps into skip too. 
Thus, we can apply the D-Ckpt rule on the second configuration. The volatile 
memory Vi is dropped, and the mapping is reset to y, i.e., it matches the commit 
policy defined for JIT blocks. in the logical relation. By Fig. 11-d, we get NV; = 
NVi., which completes deriving our goal. 


Stepping an atomic region. We can build the desired simulation by tak- 
ing the same steps described for a JIT region. Similarly, the key point is that 
the power-off and restore policies exactly match how the rules D-S-AID and 
D-RESTORE-AID, respectively, handle nonvolatile and volatile memories, and 
the commit policy corresponds to the FinWorld function in the D-CKPT rule. 


We showed that our logical relation ensures idempotency for JIT and atomic 
regions. In the next section, we show that our logical relation formalizes a se- 
mantic typing to ensure idempotency of more general policies. 
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@ m(x ely I Mala NV: | Vile, Lebel y |Ma] ool NVI Vi lepe ECan J" 
a lo 
2 (xı > elyi |Ma [0] NV; | Vi le, [zD el y4 |Ma | co | NV3 | V4 cE VIC” 


1) [x > e] yi | Ma |- | NV4 | Vj | | e#in(b > 0; tel) 
1 Oo 


$e) (a Del yi [Ma | «| NV, V“ | etin(h > 05 t ep, 1 D el y3 IMa | o | NV3 | V3 1) 
: | (where x = m :: xi) i E Vrat » T Cyne] 
©) [xi Pel yy |Ma | mg | NVi,V'I tc; 


(©) (Lxi > €] yo |Ma | 79 | NVo | Vo | co [xP €] y; | Ma | co | NV3 | V51 ch) E [Cynic ™ 


Sg eS 


M (CD E] ye |Ma | ml NVEl Vgl ce [xD ely |Ma |o |NV;I V; lc) E SIC 


unit 


Jtt 
$ J r . 
(8) (Lxx > €] y; | Ma | 7% | NV% | V} | skip, Ly > €] 7/ | Ma | 00 | NV; | V; | skip) 
e V| t unit]! 
(6) (y; | Ma | n | NV; V | skip, y |Ma | co | NV; V” | skip) € YẸ unit] 
© NVe Vg = NV; V” 


Fig. 11. Why the logical relation is enough. 


5.3 More General Policies 


We utilize our semantic typing to allow custom policies for power failure, restore, 
and commit. We extend the grammar of programs as p := - | Reg[alD, arg](c); p, 
where arg refers to the arguments that the programmer decides to pass to 
the region for initialization. To each region, we assign a unique identifier alD 
that is associated with the three policies and two functions InitGeneral; and 
InitGeneral, to initialize the static and dynamic memories, respectively. We 
add the following semantic typing rule for the general regions: 


co | Qo | Xo = InitGeneral;(; alD; c; oFG) 
alD(co) |b>0:nat| Ro; Xo IF co < co: Cunit b : nat | Q IF p : tCunit 


b : nat | Q I- Reg[alD, arg] (c); p : TCunit 


(P-REG-SEMANTIC) 


For a self-related region to be idempotent, its policies Commit, PwOff, and 
Restore must match the dynamics, so we add dynamic rules for custom regions 
in Fig. 12. The JIT and atomic region policies and their dynamic rules are 
instances of these general policies. As an example, the programmer can customize 
the policies of the first block of Fig. 1 to not checkpoint variable u. The program 
remains idempotent as the atomic region never reads u before writing to it. This 
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yo | NVo | Vo | co = restore(NV, V, «, Md, y) i 
x> ee] 89y|Ma| n| NV|tTs > [x> e]@ 70 |Ma |n | NVo | Vo | co 


D-R-REG) 


n>0 InitGenerala(NV;alD; c; y; ag) = co, NVo, Vo 
[x > £] @ alD(co) | n | NVo | Vo | co >* [x’ > £] @ alD(co) | n’ | NV’ | V’ | skip 
n'>0 NVı = Commit(NV’; V’; alD; arg) 
[xb e]@7|n| NV | Reg[(alD; arg)|(c);p => [xX be] 87 |n | NVi |p 


(D-REG) 


V’ = PwOff (NV, V, Md, y) 
[x> Ee] 8 vy |Ma |-| NV |V |}e #in(b >0;fs) => 
[x> £] 8 y |Ma |- | NV, V’ | £ # in(b > 0; TK) 


(D-S-REG) 


Fig. 12. Custom dynamic rules 


policy is implemented by real systems [23,24,41]. Our static typing rules can be 
extended to reason about them as shown in the companion technical report. 


6 Metatheory 


This section establishes the main properties of the system, which are progress and 
preservation, adequacy, and the most important result: the fundamental theorem 
where we prove that statically well-typed programs are semantically well-typed. 
The theorems and their complete proofs are provided in the companion TR [15]. 

The progress and preservation theorems assume memory locations to be well- 
formed, F4* NV | V : Q | X, which is defined similarly to the NV | VIF y: 2| X 
used in the logical relation, but imposes extra conditions based on the execution 
mode Md. It states that y maps variables in contexts 2 and X to the nonvolatile 
and volatile memories, NV and V, respectively, such that their qualifiers and the 
type of the stored values match. Moreover, it requires specific properties on the 
contexts depending on Md; in atomic mode, each checkpointed location in NV 
and §2 must have copies in V and X. We state the theorems below. 


Theorem 1 (Progress for Commands). IfMda |b R m : nat | 2; X Fsig €: T, 
then Vn : nat with nRm and Yy, NV,V with F$ NV | V: Q2 | X, either y | Md | 
n | NV |V |c is a value, or for some configuration y’ | Md’ | n! | NV’ | V' | cl we 
have y |Ma |n| NV|V|c > 7 | Md’ |n’ | NV |V |£. Moreover, if Md is an 
atomic mode, we have NV’ = NV. 


Theorem 2 (Preservation for Commands). /fMd|b > 0: nat | 2; X Fsig 
c: T, and for some HX? NV | V: 2 | X and nnat > 0, we have y | Md | 
n|NV| Vico > 7 | Md | n’| NV’ |W |œ, then for some X1, we have 
Md |b > O:nat | 92; Xi Fsig d: 7, where LN NV’ |V: 2| Xi andn’ > 0. 


Theorem 3 (Fundamental Theorem). Ifb: nat | R F p: Cuit, then 
b : nat | Ql p: tCunit- 
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We know: {range(y) = dom(NV) (a), NV = NV’, V o (b)} 


a (vy | alD(c) In INV IV 1 c,y | alD(c) | © | NV IV Ic) EZ [Cunielt? 
We know: y © y, (c) Bverogress + | y : ByTR1 : 
preservation * S y 
(2) (yı | aID(c) 10 1 NV I V1 | cy l aID(c) 101 NVIVI c) EY [Cuni] Ett 
Í ByVR6 


(3) (71 | aID (c) l-1 NV I V1 I e#in(b > 0,7 c1),y | alD(c) loo INV IVI c) EX N (nat =? Cunie) I 
Í By VRS, (a), © i 


(4) (y | aID(c) |-| NV | e#in(b > 0,7 c,),y | alD(c) | oo | NVIV 1c) EY [nat w? Cunit) 
( By VR4 
(5) (y lalD(c) |n' INV It cy, y | aID(c) 1% INV IVI c) EX” Tt Cuni)” 
i By VR 3, (b) 
©) (y |alD(c) In’ INVIVI Gy lalD(c) Io |NVIVIC) EZ eal een 


Fig. 13. Proof of the fundamental theorem for P-Ckpt 


The proof of Theorem 3 is by induction on the static typing derivation for p 
and considers the last step in the derivation. Fig. 13 explains the idea of the 
proof for the case where P-Ckpt is the last step of the derivation. By inversion, 
p = Ckpt[alD, p](c); p’. Also, c is well-typed for static contexts Q’ and X, where 
Q = 2", Six. The goal is to establish point (1) in the figure: c is related to 
itself in the term interpretation for arbitrary n, m, y, NV and V where NV | V IF 
YER”, Xex | X. The last condition enforces that the static contexts match the 
dynamic context. The condition also establishes the more refined well-formedness 
condition that F5? NV | V : Q| X in atomic mode, required by progress and 
preservation, since it enforces that each checkpointed location in NV and 92 have 
copies in V and X. In particular, NV = NV’,V and range(y) = dom(NV). 
When m = 0, the proof is trivial. Consider the case where m = k + 1. By the 
progress and preservation theorems, the first configuration can take multiple 
steps until it becomes a value qı | alD(c) | n” | NV | V; | cı that continues to be 
well-typed. If n’ > 0, the second configuration steps similarly to completion and 
establishes that the two resulting configurations are in the value relation. This 
case is not shown in the figure. If n’ = 0, the second configuration does not step 
and instead reaches point (2) in Fig. 13. At point (2), the proof must show that 
the configurations are in the value interpretation at type Cunit- 

The dashed line in the figure states that establishing point (2) implies the 
relation in point (1). The cascade of implications (dashed lines) follows the def- 
inition of the value relations at each type. At each step, we invert on the typing 
rule of the open configuration and show that runtime memories stay well-defined 
for static contexts. At point (4), we apply the power failure policy for atomic 
regions, which drops the volatile memory Vı and creates a mapping using the 
domain of NV. By the prior conditions established, we know the created map- 
ping is the original mapping y. At point (6), we apply the restore policy for 
atomic regions, which creates a new volatile memory based on NV. Again by the 
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prior conditions established, we know the volatile memory created is the original 
volatile V. The goal at point (6) is similar to our original goal at point (1), except 
that the proof uses an inductive argument to relate the two configurations at k. 

Finally the Adequacy Theorem states that semantically well-typed programs 
are idempotent, defined below. The proof is illustrated in Section 5.2. 


Definition 1 (Idempotency). A triple of a program p, nonvolatile memory 
NV, and a mapping y is idempotent, if every intermittent execution of the pro- 
gram can be simulated by a continuous execution of it: for all n,n’, x1, X4, NV’, p', 
if xi > el @y|n| NV] p> [x, oe] @y|n' | NV’ |p’, then [x2> e] @y| | 
NV | p= [x2 > £] 87 | œ | NV’ |p’. 


Theorem 4 (Adequacy). Consider b : nat | 2 IF p : Cunit, a nonvolatile mem- 
ory NV and a bijective map y that matches qualifiers and types from variables 
in Q to locations in NV. The triple of p, NV, and y is idempotent. 


7 Discussion & Related Work 


Intermittent Computing. Surbatovich et al. [41] provide the first formal 
framework for reasoning about intermittent execution, give the correctness defi- 
nition that we use, and identify precise memory invariants needed for an execu- 
tion to be correct. Our Crash types capture some of these invariants; capturing 
all requires reasoning about the effects of non-deterministic sensor inputs, which 
we leave to future work. This work is the first to treat intermittent operations 
at the type level and explore the logical interpretation of intermittent execution. 
We speculate that our type-based approach using logical relations will provide 
a cleaner foundation for reasoning about the correctness of more complex inter- 
mittent systems, e.g., concurrent ones. Other works that investigate the formal 
properties of intermittent computing either reason about the effects of intermit- 
tent execution on peripheral interactions [9] or enforce timeliness constraints on 
sensor readings [40], which are orthogonal to ours. 

Adjoint Logic. Benton et al. [7,8] provided the first categorical foundation for 
using adjoint functors to combine linear and nonlinear logics and showed that a 
well-behaved calculus requires an independence principle: linear formulae cannot 
appear in the assumptions of a nonlinear sequent. Follow up works further gen- 
eralized the system [20,21,36]. There, the relation to Pfenning and Davies’s [30] 
formulation of the lax © modality was noted; © corresponds to UF, where F and 
U are adjunctions between truth and validity categories. Short of a full curry- 
howard correspondence for our type system and underlying logic, we designed 
the rules for ¢ and | based on the above calculi. Our stable and unstable contexts 
correspond to the validity and truth contexts respectively. Thus, we speculate 
that the combination ft) in our system corresponds to the lax modality. 

Several prior works used type systems with adjoint modalities to model 
switching between program modes [6,14,34], e.g., switching a processes’ mode 
between shared and unshared [6], or adding multicasting, replicable services, 
and cancellation modes to a session-typed message passing system [34]. We are 
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the first to use these modalities to handle unforeseen shut-downs and distinguish 
between stable and power-failure prone modes. 

Logical Relations. Prior work [3,42] uses step indexing to ensure the well- 
foundedness of logical relations that handle heaps with cyclic references, dynamic 
memory allocation, or recursive types. Our Crash types model the infinite com- 
putation that an atomic region can experience under a non-deterministic number 
of power failures and re-executions. This recursion necessitates an-indexed rela- 
tion that limits the number of execution attempts a program can make. 

Jung and Tiuryn introduced a logical relation for lambda definability that 
allows varying arities [18]. The idea is to increase the arity when passing to 
later worlds instead of starting with a large arity. Our logical relation can also 
be viewed as a relation with different arities; the initial type of the relation is 
binary, while after a crash the type of the value relation only corresponds to 
the intermittent configuration. During these value steps, the relation is unary, 
with the continuous configuration acting as a kripke world for the intermittent 
configuration. After restoration, the relation reverts to binary. 

Logical relations have been widely used to prove program equivalence, e.g., 
[2,3,10,16]. At a high level, idempotency is similar to program equivalence, but 
it handles re-execution and requires us only to prove simulation from an inter- 
mittent to continuous run, not vice-versa. 

Algebraic Effect Handlers. Algebraic effect handlers [27,31,32,33] give a uni- 
fied theory for computational effects, e.g., exceptions and interactive input /out- 
put. A handler accesses the continuation to transform the computation. Follow- 
ing effect handler syntax, we write effectful environmental interactions of our 
system as c#in(b > 0,t«), where b refers to a natural number returned by the 
environment and tx is the continuation. Our restore policy resembles a handler, 
in that it has access to the continuation, but an atomic region may dismiss the 
continuation, restarting from a saved command. 

Crash Hoare Logic. Crash Hoare logic (CHL) [11] ensures the correctness of 
crash and restore operations in a file system. CHL extends Hoare logic with a 
crash condition and a recovery procedure. The crash condition states what hap- 
pens to the state on a crash. The recovery procedure runs after the crash and 
manipulates the state before resuming. The system checks that if the program 
crashes, the storage system will recover to a state consistent with the specifica- 
tions. Unlike us, they do not care about idempotency, requiring manual effort 
to formalize the crash condition and recovery policy. Our syntactic typing fixes 
the power failure, restore, and commit policies, and our formal results guarantee 
that following the policies ensures idempotency, the common correctness con- 
dition for intermittent execution. We also allow the programmer to formalize 
bespoke semantically well-typed policies. 


8 Conclusion 


This work provides the first logical interpretation of intermittent execution. It 
shows that adjoint logic can be applied to define Crash types, which internalize 
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the dualities between stable and unstable values, and complete versus partial 
(re-)executions of intermittent programs. The typing constraints capture invari- 
ants of power failure, restoration, and re-execution in intermittent systems. The 
proofs of progress, preservation, and the fundamental theorem imply the cor- 
rectness of intermittent systems, i.e. idempotency of execution. 
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Abstract. Tensor shape mismatch is a common source of bugs in deep 
learning programs. We propose a new type-based approach to detect 
tensor shape mismatches. One of the main features of our approach is 
the best-effort shape inference. As the tensor shape inference problem 
is undecidable in general, we allow static type/shape inference to be 
performed only in a best-effort manner. If the static inference cannot 
guarantee the absence of the shape inconsistencies, dynamic checks are 
inserted into the program. Another main feature is gradual typing, where 
users can improve the precision of the inference by adding appropriate 
type annotations to the program. We formalize our approach and prove 
that it satisfies the criteria of gradual typing proposed by Siek et al. in 
2015. We have implemented a prototype shape checking tool based on 
our approach and evaluated its effectiveness by applying it to some deep 
neural network programs. 


1 Introduction 


Tensor Shape Checking and Its Difficulties. Tensor shape mismatch is 
one of the common sources of dynamic errors in programs using tensors (i.e., 
multi-dimensional arrays). For example, the reshape operation of tensors takes a 
tensor x and an integer list S and returns a new tensor of the shape S obtained 
by realigning the elements in x. The input and output tensors must have the 
same number of elements; a tensor of shape [2;3;4]' can be reshaped into a 
shape [3; 2; 4], while trying to reshape it into [3; 4] results in a runtime error. 
Early detection of tensor shape mismatch errors is critical in particular for 
deep learning programs, where tensors are frequently used. Since deep learning 
programs often take a considerable amount of time to train networks, it is often 
the case that a program takes hours and days to compute the weights of deep 
neural networks only to be terminated by one tensor shape mismatch error, 
throwing away the trained weights. Even worse, some tensor shape mismatches 
can be harder to notice: mixing up the height and the width of square images does 
not raise runtime errors but degrades the performance of the neural network. 
The existing work on static detection of tensor shape mismatch errors 
can be classified into two categories. One is the whole-program analysis ap- 
proach [17,31], which collects tensor shape information by partially evaluating 


1 In this paper, we denote lists in the OCaml-style as in [1;2;3] to disambiguate it 
from the citations. 


© The Author(s) 2023 
T. Wies (Ed.): ESOP 2023, LNCS 13990, pp. 197-224, 2023. 
https: //doi.org/10.1007/978-3-031-30044-8 8 


198 M. Hattori et al. 


1 let model s = 
2 let f =... in let g=... in fun x -> let y=fxingy 
3 let _ = model 1 (Tensor.rand [20]) 


Fig. 1. An OCaml program written with OCaml-Torch. 


the program in the style of abstract interpretation. The other is the type-based 
approach [3,25], which expresses the shapes of tensors as a part of the type infor- 
mation. Still, none of them is fully satisfactory: either they are too conservative 
and reject valid programs, or fail to detect some shape mismatch errors. 

This paper pursuits the type-based approach as it is expected to provide 
modular detection of tensor shape inconsistencies. Designing an appropriate 
type system and a type inference procedure to reason about tensor shapes is 
challenging because shapes are first-class objects. For example, the library func- 
tion Tensor.zeros of OCaml-Torch [4] (which provides OCaml bindings for 
libtorch [20]) takes a list S' of integers, and returns a new tensor whose shape is 
S. Thus, we have to work with dependent types: Tensor.zeros would be given 
the type S:int list > {r : tensor | r.shape = S}. It is difficult to infer 
such dependent (refinement) types fully automatically. Yet, we wish to avoid 
programmers’ burden of writing too many type annotations. 

Another difficulty is that shape constraints can be so complex that even type 
checking, let alone inference, can be too costly or impossible. For instance, the 
reshape operation explained earlier needs the proof that the shape of the input 
tensor x is compatible with the given shape S = [s1;...; Sn] (i-e., if the shape 
of x is to be [s4;...;s/,], then 77,5; = M?s; holds)”. Thus, type checking 
requires complex reasoning about (non-linear) integer arithmetic and lists. 


Overview of Our Approach. Based on the observations above, we propose an 
approach that is expected to work well in practice despite the above-mentioned 
difficulties. Our approach can be characterized by three main features: best-effort 
type inference, hybrid type checking, and gradual typing [27]. We explain them 
using our prototype tool GRATEN®. 


Best-Effort Type Inference. GRATEN does not try to infer the most general 
types; it performs type/shape inference in a best-effort manner. Thanks to this 
design choice, GRATEN works even if no type annotations are provided (de- 
spite that the underlying type system involves dependent types), and yet it can 
statically detect (not necessarily all but) some shape mismatch errors. 

As an example, let us consider the program in Figure 1. The function model 
takes an integer parameter s, defines functions f and g, and returns a layer 
(which is a function that takes a tensor and returns a tensor) which composes f 


? Actually, some s; can be —1, in which case the size of the i-th dimension is unspec- 
ified. 

3 The tool is publicly available at https: //doi.org/10.5281/zenodo.7590480. The 
source code is also publicly available at https: //github.com/momohatt/graten. 
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1 let model s = 
2 let f =... inlet g=... in 
3 fun x -> let y = if s = 1 then x else f x ing y 
Fig. 2. The program from Figure 1 with small modification. 
1 let model s = 
2 let f =... in let g=... in 
3 fun x -> let y = if s = 1 then x else f x in 
4 g (assert (y.shape = [10]); y) 


Fig. 3. The program returned by GRATEN given the program in Figure 2. 


and g. The definitions of f and g are omitted here, but their types are assumed 
as below, where s in the type of f is the argument of model and the function 
nth(n, S) returns the n-th element of the list S (the index starts with 0). 


f : x:{v : tensor | len(v.shape) = 1} —> tensor ([nth(0, «.shape) /s]) 
g : tensor([10]) + tensor([1]) 


These types indicate that f takes a 1-dimensional tensor (i.e., a vector) and 
returns a vector whose length equals the length of the argument vector divided 
by s, and that g expects a vector of length 10 and returns a vector of length 1. 
The formal syntax of types will be introduced later in Section 2. 
For the program above, GRATEN’s best-effort inference outputs the following 
type for the function model. 
s:int > x: {v:tensor | len(v.shape) = 1 A nth(0, v.shape)/s = 10} > tensor([1]) 


Here, the constraint nth(0,v.shape)/s = 10 for the shape of x is necessary for 

this program not to raise a shape mismatch error at the application of g. The 
inferred type of model is used to prevent any calls to model that violate the 
constraint. Indeed, GRATEN rejects the call on line 4 of Figure 1, where the 
arguments do not satisfy the constraint nth(0.v.shape) _ 10, As in this example, 
our approach can statically detect shape mismatches when enough type infor- 
mation has been obtained from the best-effort type inference or user-provided 
type annotations. 


Hybrid Type Checking. Another main feature of our approach is hybrid type 
checking: we combine static and dynamic checking. The type checker inserts 
assertions to program points where the type safety is not statically guaranteed, 
à la Knowles and Flanagan’s hybrid type checking [16]. For example, consider 
the program in Figure 2, which is obtained by adding a conditional branch to 
the one in Figure 1. The type of the then and else branch of the if expression 
are inferred to be tensor(x.shape) and tensor ([2RO%-shepe) )) respectively. In 
this case, the type of y is simply inferred to be tensor without any information 
about its shape, and the inferred type for model is as follows. 
s:int > a:{v : tensor | len(v.shape) = 1} — tensor ([1]) 
Thus, the best-effort inference of GRATEN fails to capture the constraint 


Sa = 10 for x due to the imprecise type information of y. Along with 
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let model s = 


1 

2 let f =... inlet g=... in 

3 fun x -> 

4 let y = ((if s = 1 then x else f x) : tensor([nth 0 x.shape / s])) 
5 ing y 


Fig. 4. The program from Figure 2 after adding type annotations. 


the inferred types, GRATEN outputs the program in Figure 3, which is the same 
as the original program except for the assertion inserted at the argument of g. 
Since the statically inferred type of y fails to guarantee that the application of 
g to y does not leads to a shape mismatch error, GRATEN inserts the assertion 
to check the requirement dynamically. 


Gradual Typing. Lastly, our approach incorporates gradual typing |27|* so that 
the users can improve the precision of inferred types by adding type annotations. 
For example, let us consider the program in Figure 4, which is obtained from 
the one in Figure 2 by adding a type annotation to y. With this annotation, 
GRATEN infers the same type for model as it did for model in Figure 1, and no 
assertions are inserted. As such, adding correct type annotations improves the 
type checking and decreases the number of assertions inserted. 

Thanks to the best-effort inference, users need not add type annotations to 
everywhere in the program. They can focus on the program points where the 
static inference did not perform well, which is indicated by the insertion of asser- 
tions. We prove that our type system satisfies the gradual guarantee [27], which 
ensures that adding type annotation preserves the type-ability and the behavior 
of the program (with some assertions inserted) regardless of its precision, as long 
as the annotation does not disagree with the program. 


Among the three features, the notion of hybrid type checking was first pro- 
posed by Knowles and Flanagan [16], and our gradual typing is closely related to 
gradual refinement types by [18], but we believe that the particular combination 
of three features is new. In particular, unlike the original gradual refinement 
types [18], we insert assertions instead of carrying around evidence terms [11] in 
the reduction to guarantee type safety. 

The contributions are summarized as follows. (i) The formalization of a type 
system that combines hybrid type checking and gradual typing. We define our 
type system as the type-based transformation relation from source programs to 
programs with run-time assertion checks. We prove the soundness of our type 
system as well (Section 2). (ii) A proof that our system satisfies the gradual 
guarantee [27] (Section 3). (iii) Implementation of a best-effort type inference 


4 Usually, gradual typing introduces new syntax for gradual types and makes a dis- 
tinction between static types and gradual types. However, our type system does not 
have such distinction; it only uses the standard refinement types. As we see later, 
we extend the standard refinement type system with cast (assertion) insertion rules 
so that it can be viewed as a gradualized type system. 
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M (term) ::= c | x | Ax:t.M | M x | (M : T) | let c= Mı in M2 
|  fix(f:(£:T1 > T2), x, M) | if z then Mı else M2 
T (type) := {x : B | p} | TiTi > T2 
I (type env.) := Ø | T,a:7 A (base type env.) := Ø | A,x : B 


Fig. 5. Syntax of the source language, the types and the type environments. 


on a prototype system GRATEN inference (Section 4). (iv) Experimental evalu- 
ation of GRATEN using the examples of deep learning programs bundled in the 
OCaml-Torch library. We confirm that GRATEN can statically type-check the 
programs effectively with a reasonable amount of type annotations (Section 5). 


2 A Gradually-Typed Language with Refinement Types 


In this section, we formalize our type system and the translation to insert asser- 
tions. We first introduce the source and target languages of the translation in 
Sections 2.1 and 2.2. We then formalize the type system and the translation and 
prove their soundness in Section 2.3. The gradual guarantee is discussed later in 
Section 3. 


2.1 Source Language 


We consider a call-by-value functional language, whose syntax is given in Fig- 
ure 5. Throughout this paper, n, c, and x respectively denote integers, constants 
(including integers and primitive functions) and variables. The base types B and 
refinement predicates y are explained later. 

Type annotations can be added to the function arguments Aw:7.M, recursive 
functions fix(f:(%:7; > T2), x, M) and to arbitrary expressions by (M : 7T). In 
the implementation of GRATEN, users may omit the type annotations in lambda 
expressions and recursive functions as the best-effort type inference tries to com- 
plete them. 

The argument of a function application and the branching condition of an 
if-expression are restricted to variables for the sake of simplicity of typing rules. 
Note that this restriction does not lose generality, as a general function applica- 
tion Mı Mə can be normalized to let f = Mı inlet z = Mə in fa. 

Types are defined following the standard definition of refinement types. Intu- 
itively, the type {x:B | y} describes a value x of type B such that vy holds. For ex- 
ample, {x:int | x > 0} is the type of non-negative ints. We may omit the refine- 
ment predicates when they are true. For example, we may write {x:int | true} 
as int. 

The language presented so far is general; in GRATEN it is instantiated to a 
language for tensor programs by defining the base types and refinement pred- 
icates as in Figure 6, and assuming that primitive operations on tensors are 
included in the set of constants ranged over c. The refinement predicates, shapes 
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B (base type) ::= bool | int | int list | tensor 


y (predicate) ::= true | false | sı = s2 | S1 = S2|2|-7y| yi A p2 | p1 V p2 
broadcastable(S;, S2) | reshapeable(S}, S2) 

S (shape) ::= [s1;...;8n] | £ | v-shape | cons(s, S) | append(S, S2) | tail(S) 
init(S) | insertAt(s1, s2, S) | dropAt(s, S) | swap(s1, s2, S) 
reshape(S1, S2) | broadcast(S1, S2) | matmul(S1, S2) 


s (size) := n | £ | —s | s1 + s2 | sı X s2 | = | head(S) | last(S’) 
52 


len(S) | nth(s, S) | prod(S) 
Fig. 6. Syntax of base types B and predicates y in GRATEN. 


v (value) := c | æ | [v1,.--,Un] | A£7.N | fix( f, x, N) 
N (cast term) ::= v | if v then Ni else Nə | Nv | let 2’ = Nı in N? | assert (p); N 


Fig. 7. Syntax of the target language. 


and sizes are expressions of type bool, int list and int respectively. The sup- 
ported predicates are those described by quantifier-free formulas of first-order 
logic. As shown in the definition, they may use some built-in predicates and 
functions over integer lists such as append and primitives on integer arithmetic 
in order to express common tensor operations. We implicitly assume that the 
refinement predicates are well formed (as defined in the full version [13]). 


2.2 Target Language 


As explained in Section 1, we insert run-time checks into places where type-safety 
cannot be statically guaranteed. Figure 7 shows the syntax of programs obtained 
by the insertion of assertions. A main difference from the source language is the 
addition of assertion assert(y); N, which is used to implement the run-time 
checks. Like Flanagan’s hybrid type system [16] (and unlike the blame calcu- 
lus [32]), we guarantee the safety of target programs by assertions. Compared 
with the blame calculus, this method is expected to be easier to implement since 
most of the modern programming languages are equipped with assertions, and 
more efficient in that it avoids the accumulation of dynamic casts at runtime. 
This implementation of the dynamic cast is possible since our system is only 
“sradualized” at the predicate level of the refinement type and the underlying 
simple type is static. 

Another difference is that the binders in let expressions are annotated with 
their type. This is required when defining the precision relation over the cast 
terms in Section 3. 

The substitution and the reduction rules of the cast terms are presented 
in Figure 8. The evaluation of primitive function ev(c,v) is defined to be the 
return value of the primitive function c applied to an argument v if v meets the 
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[v/a]N Ni — No 


[v/ax](assert(y); N) = assert([u/z]y); [v/xz]N assert(true); N —> N 
[v/x](Ay7.N) = Ay fu /a]N assert(false); N —> error 
cv — ev(c,v) 


(Variables are assumed to be alpha-renamed so that 
variables at different scopes do not collide) (Az™.Ni) vu — [v/a] Ni 
Fig. 8. Selected rules of substitution and reduction of the target language (the full 
definition is given in the full version [13]). 


P(e) =n >T I(x) ={y: Bl Y} 
D; t c: ty(c)(CT-Con) T; «: F(a) T;pka:{y:B|y=a2} 
(CT-VF) (CT-VB) 
Dt: N: Isop N: l;pF v: 
TTIP T2 (CT-Lam) P LITI > T2 SPT UET, (CT-App) 
D; pH Aa N : £T > T2 T;pt Nv: [w/x]r2 
I, f: (£m > Tr) s: n;pF N: D;png' HN: 
Peten r2) ee = (CT-Fix) PNF 7 (CT-Ass) 


D; H fix(f77?, 2, N): £T > T2 T; F assert(y’); N: 7 


T;pt v: {x : bool | y’} T;pAvE-EM:7 T;pA7v- No: 7 


(CT-IF) 
I';g if v then Ni else No: 7 
Dip Mit Tyo:m;pht No: t T;p+N:7' D;pF T <T 
Tsp let 27! = N; in No:7 Psp N:T 
(CT-LET) (CT-Sus) 


Fig. 9. Typing rules for the cast terms ; yp N: 7. 


constraint of the argument of c, and otherwise undefined. We denote N f if there 
exists an infinite reduction sequence from N. 

The substitution for cast terms is defined in the standard manner, except that 
the implicitly-annotated type information and the predicate in the assertion need 
to be updated as well. As can be seen in the definition of the cast term reduction, 
these implicitly-annotated types are only required for the sake of formalization 
and ignored at runtime. 

We also introduce the type derivation rules for the cast terms I; + N: 7 
in Figure 9. This relation is used in the discussion of the soundness of the type 
system later in Section 2.3. The quadruple relation I; F N : T denotes that 
a cast term N has type 7 under a type environment J’ and a logical context 
y. The logical context y holds the information of logically valid predicates at 
respective program points. New predicates are added at the then branch and 
the else branch of (CT-IF), and the post-assertion cast term in (CT-Ass). The 
subsumption is allowed in (CT-SuB) by the subtyping relation I’; F 71 <: T2 
(Figure 10), which is defined in a standard manner. 
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S(T), BT(I) T;pe 1 <: T 
@(@) = true 
B, x : {y : B| p}) = 8) A [x/yly E YBT(T), £:B.P(T) Ap Api = p2 
B(T, 2 : (yt > 72)) = S(T) Tiyt {x:B]| pi} <: {x: B | yo} 
BT(Ø) = 2 (SuB-BASE) 
BTL ree | By BIE na Typ T3 <: T I, £ : T3; p F T2 <: Ta 
BI(I, x : (y:t1 > T2)) = BI(L) 


DI; p F TTi > T2 <: 2:73 > T4 
(SuB-Fun) 
Fig. 10. Subtyping rules. 


I(x) = YT > T2 


I; p tH ce~ c: ty(c) (CI-Const) EAEE Y 


(CI-VaR-Fun) 


T(z) ={y:B| p} T æz:n;ypF FM =N: T 
l;pHF r~g: {y:B|y=zr} D; H Az .M ~ AL™.N : £T > Te 
(CI-VaR-BASE) (CI-Lam) 


Fr; M~ Ni iyn > 72 I(x) = 73 T;p' tT S11 ~ No 
T;pb Ma ~ (let 27! = Nox in Ni 2): [x/y]t2 


(CI-App) 


T, f: (aim > 72), ec: 1; pr Mw N: 72 
D; H fix(f:(a@:71 > 72),0, M) ~ fix( f2, x, N) : LiT > T2 


(CI-F1x) 


T;p'}M~ Nit Tæ: Tm; F My ~ No: 7 BI(I’) Fust T 


(CI-LET) 
I; pF (let x = Mı in M2) ~ (let 27! = N; in No): 7 


T;pt uv: {x: bool | y’} P;pAvF Mw M:r Tp ^w F M: ~ No: 7 
I; F if v then Mı else M2 ~ if v then Ni else No: 7 


(CI-IF) 
T;p—(>} MwN:7 T;p-}M,~ Ni: Dip Fna [r= No 
T;p+(M:tr)~ Nir T; Mı ~ let r” = Ny in Nox: 7 
(CI-ANNOT) (CI-Sus) 
Fig. 11. Type derivation rules for the source language I; M~N: 7. 
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Inserting Assertions Next, we discuss the typing rules for the source language 
and the assertion insertion into it. Figure 11 defines the type judgement and cast 
insertion relation. The intuition of 5-ary relation T; F M ~ N : 7 is: under 
a type environment I" and a logical context y, a term M translates to a cast 
term N and has type 7. If we ignore the part “~~ N” and replace the gradual 
subtyping relation < with the standard subtyping relation on refinement types 
(Figure 10), our type system is a standard refinement type system. Thus, the 
main novelty in the rules in Figure 11 lies in the use of the consistent subtyping 
relation T; pH Ti S T2 ~ N, which is explained below. 

The consistent subtyping relation T; + Ti < T2 ~ N (Figure 12)° is used 
in the cast insertion relation to guarantee that there exists a value that has both 
of the types 7, and 72 under I’ and y, and to produce an assertion term N that 
checks at runtime if a value that is statically known to be of type Tı can be used 
as a value of type 72. 

The rule for the base case (CAST-BASE) checks if there exists a value, and an 
assignment of the values to the variables in the type environment, that satisfies 
both 7, and 7). This intuitively holds if 7, is castable to 72 for some runtime 
values. The rule also produces a lambda function that implements the cast with 
an assertion. It is defined in such a way that p2 can always be used as the content 
of the assertion y’, but true can also be used for y’ if pı implies p2. Note that 
we cannot use p2 as the content of the assertion in the definition, or otherwise 
Proposition 1 does not hold. 

The rule for the function types (CAST-F UN) recursively checks the castability 
of the argument types and the return types and combines the assertion terms 
for them. Notice how the subsumption for the return types Tə and 74 has the 
meet of two argument types 7, [173 in the type environment. The meet of two 
types (Figure 12) is defined as a conjunction of the refinement predicates®. 

The consistent subtyping relation can be seen as a gradualization of the 
subtyping relation l; F 7, <: T2 (Figure 10). In fact, when a type 7, is a 
subtype of another type 79, it is possible that the assertion term generated by 
casting Tı to T2 only contains assertions that always succeed, which can be erased 
by some optimization. The following proposition states this fact. Note that this 
corresponds to the blame-subtyping theorem, one of the criteria for gradual 
typing presented in [27]. 


Proposition 1. l; F Ti <: T2 implies T; p F Ti St] ~ N for some N where 
all the assertions in N are of the form assert(true); N’. 


5 This can be understood as the refinement-type version of the differential subtyping 
in [23], although in the implementation we do not calculate the “difference” between 
pı and p2 for y’ in the assertion unless y implies y2 (and thus y’ can be true). 

6 Although the meet of two function types is defined, it does not make any difference 
in the definition of consistent subtyping relation since function types in the type 
environment is not used. 
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{e: Bl pi}n{e:B| p2} = {x : B | p1 A p2} 
(a7, > T2) N (£:T3 > T4) = £:(T1 N T3) > (T2 N T4) 


E IBT(1), x:B.B(T) A Y A y1 A y2 E VBT(L),2:B.8(T)AgA gi > (P > p2) 
Tig {x:B |1} S {2 : B | go} Ar l?1} assert(y’); x 
(Cast-BASE) 


l;o 3 In~ N Ix: Tm N T3; pF T2 S$ Ta ~ No 


CAST-FUN 
I; pF a: > T2 5 2:73 > T4 ~ ( ) 


AJo Ax (let y = Nig in let z = fy in Mz) 


Fig. 12. Definition of the consistent subtyping relation I; p 71 S T2 ~ N. 


Type Safety We conclude this section with a note on the soundness of our 
type system. The soundness is based on the fact that if the source program is 
well-typed, the program after the assertion insertion is also well-typed. 

The most critical part of the proof is to prove the assertion term can be 
assigned a function type from the pre-assertion type to the post-assertion type. 


Lemma 1. l;yF 7 St ~ N implies l; N : 2:71 > To for some variable 
x that does not occur in Ta. 


The proof is found in the full version [13]. With Lemma 1, we can prove 
that the assertion-inserted program can be assigned the same type as that of the 
original program. 


Lemma 2 (Assertion Insertion Preserves Types). TFM ~~» N: 7 
implies T; pt N: rT. 


We can also prove the standard progress and preservation properties under 
a reasonable assumption that the types of the primitive functions are properly 
defined as follows (see the full version [13] for the proofs). 


Assumption 1 F cu: 7 implies ev(c,v) is defined and F ev(c,v) : T 


Combining Lemma 2 with the progress and preservation properties, we obtain 
the type safety as follows. 


Theorem 1 (Type Safety). With Assumption 1, @;true FM ~~» N: + 
implies N —* v for some v, N f, or N —* error. 


The type safety property states that a well-typed program does not cause 
untrapped dynamic errors. The only case where a cast-inserted program causes 
untrapped errors is when the result of an application of a primitive function 
is undefined (i.e., ev(c,v) is undefined). The type safety property ensures that 
such untrapped errors do not happen for well-typed terms as long as the ty(c) 
is defined appropriately. 
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ZL 7m CE T2 METI» 
E VJ, £.p1 > ØE Ø 
= a = (PREC-BASE E 
yh {x : Bl gif E {2 : B | yo} 


ia 


Ii E Ib dom(I) m1 E T 


li,£: n C 12,2: T2 


YFAUCET YLFNRET 
al neds 2 I (PREec-Fun) 
yk aim > T2 C 2:73 > T4 


Fig. 13. Precision relation of types and type environments. 


3 Gradual Guarantee 


In a standard gradual type system, programs are compared by their precision, 
or the amount of information contained in the type annotations. This notion is 
used to define the gradual guarantee [27], which is the core property of gradual 
typing. The gradual guarantee comes in two parts. The first one is called static 
gradual guarantee, which states that decreasing the precision of type annotation 
from a well-typed program still preserves the typeability of the program at a less 
precise type. The second one is called dynamic gradual guarantee, which claims 
that a less precise program behaves the same as the more precise one with fewer 
assertion errors. 

Below we first define the precision for the language introduced in Section 2. 
We then show that our type system satisfies the gradual guarantee. 


Precision. Figure 13 defines the precision relation + 7 E Tə on types by 
using the logical implication between the refinement predicates. The sequence 
of variables x keeps the variables that may appear in the refinement predicates. 
For example, the following is an example of the type precision relation for the 
base type. 

H {x : tensor | x.shape = [3]} E {x : tensor | len(x.shape) = 1} 

Note that in the rule (PREC-F uN), the precision of the argument type and the 
return type are compared independently; the type information on g is not used in 
the comparison of the return types. This is in contrast with the rule (SUB-FUN) 
in Figure 10 for subtyping. Figure 13 also extends the relation to I E I” on type 
environments. The precision relation is also extended to the relation 7+ M E M’ 
on terms, by the rules in Figure 14. Here, % is the sequence of variables in scope. 
Finally, we define the precision relation of the cast terms in Figure 14. Unlike 
the term precision relation (Figure 14), the precision relation [;y + Ni E No 
on cast terms requires the type environment I" and the logical context y in the 
judgement, and the refinement extraction from the type environment (I`) is 
used in the rule (PC-ASSERT). We also assume the following property on the 
evaluation of the primitive functions. 


Assumption 2 If ev(c,v2) and ev(c,v1) are both defined, then vı E v2 implies 
ev(c, v1) E ev(c, v2) 
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zH- M,C M2 
yk¥m Cte y, x- Mı E Mə y- M,C Mə yen Ct 
yb Awim.My E Az:T2.M2 yk (Mi: 71) E (M2: 72) 
(PM-Lam) (PM-ANNoT) 
rT; p H Nı = No 


VBT(I’).®(L) A pA y1 > p2 TypAgiF ME Ne 
I; F assert(yi); Ni E assert(ye); N2 


(PC-ASSERT) 


Fig. 14. Selected rules for the precision relation on terms and cast terms (the full 
definition is found in the full version [13]). 


Intuitively, the precision of cast terms are designed in such a way that, when 
@;true F Nı E No holds, the assertions in N4 is more strict than that of No, 
and therefore the dynamic checks in N; is more likely to fail than in N2. The 
following two propositions state this intuition (the proofs are found in the full 
version [13]). 


Proposition 2. Suppose @;true F N, : T and @;true F No : +’. Then, 


@;true + Nı E No and Ni — Ny imply No — N} and Ø; true + Ni E N3 
for some N3. 


Proposition 3. Suppose @;true F N, : T and @;true F No : +’. Then, 
Ø; true H Nı E Na and No — N, imply either of the following. 


— Nı — Nj and Ni E N; for some Ni 
— Nı — error 


Gradual Guarantee. We show that our system satisfies the gradual guaran- 
tee [27]. First, we prove that the consistent subtyping relation I’; F 7 S72 ~~ 
N is upper-closed with respect to the precision relation 7+ Tı E 73 on types. 


Lemma 3. T;pb Ti S mn ~ Mi, dom(I’) F ri E 73, dom(I’) F T2 E T4 and 
T E I” implies I’; p F T3 S T4 ~ No for some No. 


We can further prove that the cast term N2 in the statement of Lemma 3 is 
less precise than the original cast term Nj as follows. 


Lemma 4. Suppose I E I’, dom(T) + rı E Tti and dom(I) F T2 E T4. Then, 
l;o Tn [LTr N and I'; pF Ti ST ~~ N' implies T;p > NECN’. 


Using the above properties, we can prove the following lemma which consti- 
tutes the core part of the proof of the gradual guarantee. 


Lemma 5. TCI”, dom(lT) F- M E M' andl;p+ M ~ N: 7 imply I"; F 
M'~ N':7', P;p - NECN’ and dom(I) F rE T for some N' and 7’. 
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Finally, we can show the static and dynamic gradual guarantee as follows. 


Theorem 2 (Static gradual guarantee). 2+ Mı E Mz and+ Mi : Tı imply 
F Mə : T2 and Ø F Tı E Ta for some 72. 


Proof. This follows immediately from Lemma 5. 


Theorem 3 (Dynamic gradual guarantee). Suppose Ø F Mı E Mə and 
F Mı ~ Ni: Tı. Then, there exist Nao and T2 that satisfy all of the following. 


= F Mo ~ No: T2. 

— Nı —* vı implies No —>* v2 and vı E v2 for some və. 
— N, tt implies No ty. 
— Na —* v implies Ny —>* vı and vı E ve for some vı, or Ni —* error. 
— No f implies Ni tt or Nı —* error. 


Proof. By Lemma 5, Mə ~> Nə : T2 holds for some Nə and Tə where F Ni E No 
and F 7, E 79. Also, from Lemma 2, we obtain F Ny : 7, and F No : Tə. Using 
Proposition 2, N; —>* vı for some vı implies No —>* v2 for some v2 such that 
vı E vg. Also, Ny —>°%° implies Ng —>°. Using Proposition 3, Nə —>* v2 for 
some v2 implies N —>* vı for some vı such that vı E v2, or Ny —>* error. 
Also, No —+° implies N —>™® or Nı —>* error. 


4 Best-Effort Type Inference 


Thanks to our combination of gradual typing and hybrid checking described in 
the previous sections, a type inference procedure need not necessarily output 
the most precise types. It is allowed to perform type inference only in a best- 
effort manner, and the results in the previous sections do not depend on the 
particular design of the type inference procedure. Nevertheless, it is desirable 
for the procedure to infer reasonably good types. In this section, we report a 
specific design of the type inference procedure, which we have implemented in 
our prototype system GRATEN; as reported in the Section 5, our procedure 
works reasonably well for actual deep learning programs. 


4.1 Overview of Type Inference and Checking in GRATEN 


The type checking in GRATEN consists of the following three phases: (1) sim- 
ple type inference, (2) best-effort refinement type inference, and (3) consistent 
subtyping checking and assertion insertion. 

In the first phase, GRATEN performs the simple type inference using the 
standard Hindley-Milner algorithm and annotates the AST with the inferred 
simple types of each node. 

In the second phase, GRATEN first collects all the consistent subtyping con- 
straints of the form I; F Ti S 72 ~ N from the source program. When it 
encounters AST nodes whose refinement type cannot be constructed directly, 
GRATEN generates template refinement types using the simple types inferred in 
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the previous phase. Template refinement types may contain variables for unde- 
termined predicates (referred to as predicate variables). 

Using the collected constraints, GRATEN then tries to find a solution for all 
of the predicate variables with its hand-made constraint solver. The constraint 
solving takes place on every let binding to allow let-polymorphism on shapes. 
We discuss the detail of the implementation of the solver in the next subsection, 
but at a high level, the solver tries to find such a solution that: 


— only general types are inferred, as otherwise it could result in rejecting well- 
typed programs. 

— I; F Ti <: T2 holds for as many constraints I’; p F Tı S T2 ~ N as possible. 
This is to make the cast term N consist of trivial assertions (which can 
statically be discharged to avoid run-time overheads; recall Proposition 1). 


Given that the subtyping constraints can be expressed in the form of constrained 
Horn clauses (CHC) and not all the subtyping constraints need to hold, the 
problem above is essentially a CHC solving problem with weak constraints and 
maximality [22] where the optimization objective of the problem is defined by 
pointwise logical comparison of the solutions. 

The constraint solver of GRATEN does not always find a solution for all 
predicate variables. In such cases, GRATEN assigns true to the undetermined 
predicate variables; that way, they will at least not invalidate the consistent 
subtyping constraints. 

Note that GRATEN does not take into account the consistent subtyping 
I; F | ST ~ N itself when trying to find a solution, as we expect that 
it would be rare for a consistent subtyping I; p F 71 S T2 ~ N to hold when 
the subtyping relation T; F 7, <: T2 does not hold. GRATEN therefore defers 
the check of consistent subtyping constraints to the next phase. 

In the third phase, GRATEN checks the validity of consistent subtyping con- 
straints using the inference results for the predicate variables from the previous 
phase. GRATEN first attempts to simplify and verify the constraints by a hand- 
made solver, but it falls back on using z3 [5] with timeouts if it does not work. 
Simultaneously, it also generates the assertion terms and inserts them into the 
source program. 


4.2 Heuristics of Best-Effort Type Inference 


To solve the subtyping constraints explained above, we have implemented a 
hand-made constraint solver. GRATEN does not use off-the-shelf SMT or CHC 
solvers such as Z3 [5], since the refinement predicates in GRATEN often use 
complicated predicates on integer lists, for which standard SMT/CHC solvers 
cannot find a solution in a reasonable time. Also, while GRATEN should infer 
general types (so as not to reject well-typed programs), those generic solvers are 
not biased towards generality and return any (non-general) solution that satisfies 
the constraints. This subsection describes the heuristics used in GRATEN for 
constraint solving. 
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The preparation for the inference is already started when GRATEN generates 
the template refinement types during the constraint collection. For each predicate 
variable generated, GRATEN attaches the set of program variables it depends 
on, which is calculated from the type environment. This is used in the constraint 
solving later to avoid assigning irrelevant predicates to the predicate variables. 
We denote predicate variables as pz(y), where Z denotes the set of program 
variables it depends on and y denotes the parameters of the predicate variable. 

After collecting the constraints, GRATEN decomposes the subtyping con- 
straints to constrained Horn clauses of the form Ji A ~2 => p3 following the 
definition of the subtyping relation (Figure 10). The notation Ø denotes a set of 
predicates, logically interpreted as the conjunction of the predicates. The first, 
second, and third set of predicates in the clause respectively corresponds to the 
predicates from the context P(T) A y, the refinement of the type on the left y1, 
and that of the type on the right p2. We intentionally distinguish between (1 
and $2 on the left-hand side of the clauses in describing the constraint solving 
algorithm. For example, let us reconsider the program in Figure 2. The subtyping 
constraints collected from the if expression of the program would be as follows, 
where p,q and r are the predicate variables generated for the type of s, x and 
the if expression respectively. 


I; (s = 1) F {v:tensor | qs,(v)} <: {v:tensor | rs2,,(v)} 

T; (s #1) {v:tensor | q.,(v)} <: {v:tensor | len(v.shape) = 1} 

I; (s #1) tensor([nth(0, x.shape)/s]) <: {v:tensor | rs.2,,(v)} 
where I" := [s > {v:int | p (v)}, £x + {v:tensor | gs,(v)}] 


These constraints are decomposed into the following clauses. 
{ps($),4s,0(@),8 = 1} A {gs,v(v)} > fsz (V) 
{ps (s), ds,s(£), 5s # 1} A {Gs,.(v)} = len(v.shape)=1 (1) 
{ps (8), qs, (£), s #1} A {v.shape = [nth(0, x.shape)/s]} > rs,2,v(v) 


From the clauses obtained as above, GRATEN tries to find a solution for the 
predicate variables using an algorithm presented in Algorithm 1. 

The algorithm processes the constraints by first trying to find a solution for 
predicate variables that occur on the right-hand side of a clause Y1 A P2 > P3 
(Line 6-10), and then on the left-hand side of a clause (Line 11-15), and repeats it 
until either all of the constraints are solved or the constraints cannot be processed 
any further (Line 4). In Line 8 and Line 13, the set of program variables % of a 
predicate variable pz is used to assign the predicates to the predicate variables’. 

During the iteration, the constraints need to be occasionally updated with 
the current solutions 0 by applying the substitution 0 to all the predicates in 
the constraints. After that, we also simplify the set of clauses (with simplify 
in Algorithm 1) by removing the predicates from the right-hand side of a clause 
that trivially follows from the left-hand side, and by removing clauses whose 


T The set of program variables used in predicates is defined following the standard 
definition of free variables, except that the program variables used in a predicate 
variable pz is defined as T. 
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right-hand side is empty. For example, a clause {} A {z = 1} => {a = 1} is 
simplified to {} A {z = 1} = {}, and then removed from the set of clauses. 

To illustrate the behavior of Algorithm 1, consider applying it to the clauses 
(1). During the first iteration of the while loop (Line 4), the first for loop (Line 
6) exits with an empty 0 as r appears on the right-hand side of multiple clauses 
and cannot be resolved here due to the check at Line 7. In the next for loop 
(Line 11), 6 is updated to: 

lds (Vv) => (Len(v.shape) =1A d,(v))] (2) 
where q% ,,(v) is a fresh predicate variable, and the constraints c would be updated 
as follows. 

{ps(s), len(x.shape) = 1, q5 s(x), 5 = 1} A {len(v.shape) = 1A qi, (v)} > Tsz, (V) 
{ps(s), len(x.shape) = 1, q4 (£), s #1} A {v.shape = [nth(0, .shape)/s]} > rs,2,v(v) 
The while loop exits after the second iteration, as no new predicate variables 
can be added to 0 and c=’ holds. Thus, we only obtain (2) from Algorithm 1. 
After the inference, GRATEN assigns true to the remaining predicate variables 
p, q andr. 


Algorithm 1 Algorithm for calculating the solutions @ to predicate variables 
from constrained Horn clauses c. 

Input: constrained Horn clauses c 

Output: the mapping from predicate variables to its solution (predicates) 0 


1: procedure SOLVE(c) 

2 Let 0 be an empty substitution 

3 dec 

4: while c # Ø and c Æ œ do 

5: dec 

6: for every clause of the form p1 A G2 => pz(y) in c do 

7 if pz(y) £ G3’ for any other Gi’ A G2’ > G3’ in c then 

8: Let pz be the maximal subset of (2 that only uses variables in % 

9: 6 & [ps(¥) > A Ga'] 00 > o is a composition of mappings. 
10: c+ simplify(6c) > simplify/(-) is described in the main text. 
11: for (1 \ G2 > $3 inc do 
12: for every predicate variable pz(y) in (1 U G2 do 
13: Let (3 be the maximal subset of 3 that only uses variables in 7 
14: Let qz(y) be a fresh predicate variable 
15: 6 & ps0) ++ (AP) A0] 06 
16: c + simplify(@c) > Also updates the remaining items iterated by L11. 
17: return 0 


5 Experiment 


This section reports on experiments to evaluate the effectiveness of our approach 
by running our tool GRATEN for the example programs bundled in the OCaml- 
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Torch library [4]. We have also checked how type annotations changed the infer- 
ence results. 


5.1 Methods 


Input and Output of GRATEN GRATEN takes an OCaml program and 
performs type checking with its best-effort type inference. If the type checking 
is successful, it returns the inferred types of top-level variables defined in the 
program, and the source program with necessary assertions inserted. Otherwise, 
the type checking fails with an error message. 

The assertions are inserted into the output program only when they are 
needed. Namely, assertions are inserted into the places where the consistent 
subtyping l; 7 ST ~ N is used only when I’; F 71 <: T2 doesn’t hold 
(see Proposition 1). 

Besides the source program, GRATEN also reads the types of the library 
functions (including those of OCaml-Torch) from manually prepared stub files. 
For example, the type of tr (matrix transpose function) is defined as follows. 


val tr : x:{ v:tensor | len v.shape = 2 } 
-> tensor([nth 1 x.shape; nth 0 x.shape]) 


Note that describing the types of some higher-order OCaml-Torch functions 
requires the polymorphic extension, which we sketch in the full version [13]. For 
example, the type of Layer.forward is defined as follows. 


Vb, :boo1, bg:bool. 
(x:{a:tensor | bı} > {y:tensor | b2}) > a:{x:tensor | bı} > {y:tensor | b2} 


GRATEN handles such types by instantiating the quantified parameters (bı and 
bə in the above case) with fresh predicate variables. 


Test Cases We applied GRATEN to programs under examples/ directory of 
the repository of OCaml-Torch®. The list of programs tested is shown in Table 1. 
Since some programs use features of OCaml or OCaml-Torch that are not yet 
supported by GRATEN, they were modified not to use such features without 
changing the structure of the neural network. Major modifications added to the 
target programs are listed below. Other smaller syntactic modifications can be 
found in the supplementary materials. 


(M1) Replacing or removing type-polymorphic functions. Some functions that cre- 
ate loops such as List .foldl are replaced with recursive functions. Others 
such as no_grad are replaced with the type-instantiated versions. 

(M2) Removing use of non-integer lists, especially tensor lists and layer® lists. As a 
result, two list-taking primitive functions are removed. One is Tensor.cat, 
which takes a list of tensors and returns the concatenation of them. It is 


8 nttps: //github.com/LaurentMazare/ocaml -torch/tree/a6499811£4/examples 
° Functions that take a tensor and return a tensor. 
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replaced with a variant Tensor.cat_ which takes only two tensors. The 
other is Layer. sequential, which takes a list of layers and returns a layer 
that sequentially applies all the input layers. 

(M3) Replacing mutable float objects with 0-dimensional tensors, as GRATEN 
does not support reference types. 


As an example of (M1) and (M2), consider the following function, which 
creates a list of linear layers and returns a new layer that applies all the layers 
in the list. 


1 let f vs “num_layers = 
2 List.init num_layers ~f:(fun i -> Layer.linear vs ~input_dim:(i+1) (i+2)) 
3 |> Layer.sequential 


The i-th layer in the list takes a tensor whose last dimension is size i+1, and 
returns a tensor of the same shape except that the last dimension is changed 
to i+2. By the modifications (M1) and (M2), the above function definition is 
replaced with: 


1 let f vs “num_layers = 

2 let rec loop i xs = 

3 if i=0 

4 then Layer.id xs 

5 else loop (i-1) xs ~is_training |> Layer.linear vs ~input_dim:i (it+1) 
6 in Layer.of_fn (loop num_layers) 


Some programs in the examples/ directory are excluded from the test cases 
for the following reasons. 

— neural_transfer uses a library function Vgg.vgg1i6_layers whose type 
cannot be described in GRATEN; the relation between its inputs and its output 
tensor’s shape could not be expressed in the syntax supported by GRATEN. 


— Programs dqn.ml, dqn_atari.ml and dqn_pong.ml in 
reinforcement-learning use queues which are not supported in GRATEN yet. 
— env_gym_pyml.ml and venv_env_gym_pyml.ml under 


reinforcement-learning use Python objects whose verification is not 
the scope of this paper. 

— reinforcement-learning/policy_gradient.ml uses mutable lists which 
cannot be replaced with another datatype already supported in GRATEN. 

— yolo/darknet.ml and translation/lang.ml use hash tables which are 
not supported in GRATEN yet. 

— translation/dataset.ml and translation/lang.ml are irrelevant as 
tensor objects do not appear in them. 


Evaluation We evaluated the best-effort inference of GRATEN on the following 
three aspects. 

First, we counted the assertions inserted into the original program when 
GRATEN is used for the target program. Since the assertions indicate the pro- 
gram points that could fail at runtime, the user of GRATEN would wish to 
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pay attention to the location and the number of inserted assertions and try to 
decrease them. 

Second, we counted the minimum number of type annotations required to 
type-check the program with minimum assertions inserted. This is for evaluating 
the realistic programmers’ burden of trying to statically verify the program with 
type annotations. The annotations were added in such a way that the types of the 
functions do not lose the original generality. The type annotations are counted 
by the number of refinement types with non-true refinement predicates in them. 
For example, the following annotation counts as 3 because the refinement of the 
input tensor and the two output tensors are not true, but the refinement of the 
annotation of the second argument bool is true. 


tensor([x]) -> ~is_training:bool -> tensor([x]) * tensor([x]) 


Third, we also measured the time taken by GRATEN to analyze the unan- 
notated and annotated programs. The experiments were conducted on a Linux 
machine with 12-core Intel i5-11400 (2.60GHz) and GRATEN is implemented in 
Haskell with GHC version 9.0.2. 


5.2 Experimental Results 


Table 1 summarizes the experimental results. We analyze those results by the 
following three aspects: assertions, type annotations and analysis time. 


Inserted Assertions Out of the 26 programs tested, 10 programs required 
no type annotations to type-check without assertions, and other 7 programs 
type-checked without assertions after adding appropriate type annotations. For 
the remaining 9 programs such as gan/began.ml and gan/gan_ stability.ml, we 
could not eliminate all assertions, although some of them were removed after 
adding type annotations. The remaining assertions were due to the imprecise 
type signatures of some library functions. For instance, Torch.Serialize.load 
is a function that loads a tensor from a file and its type signature is defined as 
follows. 


val load : ~filename:string -> tensor 


The return type of load is simply defined as tensor since it is impossible to 
assume any properties about its shape. As a result, an assertion was inserted to 
check if the loaded tensor satisfies the requirement to run the program without 
uncaught errors. Even adding type annotations to the loaded tensor does not 
remove the assertion. 

Some other functions are given imprecise types due to GRATEN’s immature 
support of polymorphic data types. For example, the type of Tensor.stack is 
defined as follows because GRATEN does not effectively support non-integer lists 
yet. Refining the return types of such functions is left as future work. 


val stack : ~“dim:int -> list (tensor) -> tensor 
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; Unannotated Annotated 
Locationiutder exafaples/ LOG time (s)|#assert|/#4annot|time (s)|#assert 
char_rnn/char_rnn.ml 98|] 1.647 1 2 0.664 0 
cifar/cifar_ train.ml 72|| 0.311 0 - - - 
cifar /densenet.ml 116]| 2.603 6 2 1.304 0 
cifar/fast_resnet.ml 64|| 0.293 0 - - - 
cifar/preact_resnet.ml 85]| 2.535 8 5 0.346 0 
cifar /resnet.ml 78|| 2.597 8 4 0.396 0 
gan/began.ml 220] 1.581 1 - - - 
gan/gan_ stability.ml 224|| 4.441 40 2 1.410 2 
gan/mnist _cgan.ml 117|] 0.498 1 - - - 
gan/mnist_dcgan.ml 136}| 1.418 4 2 0.500 0 
gan/mnist_gan.ml 83]| 0.308 0 - - 
gan/progressive_ growing gan.ml| 118|| 0.734 0 - - - 
gan/relativistic dcgan.ml 171)| 0.659 1 - 7 7 
jit/load_and_run.ml 16]} 0.214 1 - - - 
min-gpt/mingpt.ml 207|| 3.036 8 6 2.686 0 
mnist /conv.ml 53]| 0.250 0 - - - 
mnist /linear.ml 50|] 0.235 0 - - - 
mnist/nn.ml 39]| 0.210 0 - - - 
pretrained /finetuning.ml 69|| 0.294 0 - - - 
pretrained/predict.ml 68|| 0.303 2 - - - 
reinforcement-learning /a2c.ml 105)) 0.418 0 - - = 
reinforcement-learning /ppo.ml 129]| 0.438 0 - - - 
reinforcement-learning/rollout.ml | 91]| 0.734 9 5 0.425 1 
translation /seq2seq.ml 258|| 3.800 11 34 1.023 3 
vae/vae.ml 78|| 1.233 4 10 0.312 0 
yolo/yolo.ml 144|| 1.027 4 1 0.985 3 


Table 1. Results of running GRATEN to the test cases. The second column is the size 
of the program after the modification. The third and fourth columns are the results 
for unannotated programs. The third column is the duration of the type-checking and 
the fourth column is the number of assertions inserted. From the fifth to the seventh 
columns are for the annotated programs. The fifth column is the number of annotations 


added to the program. 
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Patterns of Added Type Annotations As we added type annotations to 
the test cases, we observed that the program points that require type annota- 
tions have similarities. All of the type annotations fall into one of the following 
patterns. 


(P1) Branches i.e., if expressions and match expressions with multiple branches 
(e.g., Figure 4 in Section 1). 

(P2) Recursive functions. For example, loop in translation/seq2seq.ml is anno- 
tated as follows. 


let rec loop 
~state:tensor([1; enc.hidden_size]) 
-> ~prevs:list ({ v:tensor | prod v.shape = 1 }) 
-> “max_length:int -> list ({ v:tensor | prod v.shape = 1 }) 
= fun “state ~prevs ~max_length -> ... 


(P3) Higher-order shape-polymorphic arguments. For example, sample in 
char_rnn.ml is annotated as follows. 


let sample “dataset “lstm 
“linear: (linear : x:{ v:tensor | last v.shape = hidden_size } 
-> tensor(init x.shape @ [dataset.labels])) 
“device =... 


(P4) Definition of record types. The current implementation of GRATEN expects 
that the definition of record types describes the refinement types of each 
field. 

(P5) Imprecise type signatures of primitive functions, or user-defined functions of 
dependent modules. For example, translation/seq2seq.ml has the following 
type annotation since the return type of Tensor.stack is only inferred to 
be tensor due to its imprecise type signature. 


let enc_outputs : tensor([1; nth 1 v.shape; enc.hidden_size]) = 
Tensor.stack enc_outputs ~dim:1 


The statically inferred type of enc_outputs here is tensor([1; 
enc.hidden_size]) list, so we would not need this type annotation if 
the type signature of Tensor.stack is appropriately defined. Since it is not 
possible to statically verify the correctness of these types of annotations, 
assertions would still be inserted after adding these annotations. 


The first three patterns indicate that GRATEN’s current best-effort type infer- 
ence does not effectively infer precise refinements for branches, recursive func- 
tions and higher-order shape-polymorphic arguments. The fourth pattern (P4) 
would be inevitable when using record types. It remains as future work to exempt 
users from having to add type annotations for (P5). With such improvements, 
we believe that it will become easier to find program points that require type 
annotations for better inference. 
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Number of Type Annotations There is no correlation between the number of 
assertions inserted into the unannotated program and the number of annotations 
needed to the program to minimize the number of assertions. 

For example, adding two type annotations to gan/gan_stability.ml re- 
sulted in removing 38 assertions. This is because GRATEN inferred an imprecise 
type for a helper function resnet_block without any type annotations, and 
it degraded the precision of the inference for the 24 callers of the function. 
Meanwhile, translation/seq2seq.ml required comparatively many type anno- 
tations as it has many definition of record types and several recursive functions 
with multiple inputs. 


Analysis Time For all of the 11 annotated programs, GRATEN’s type checking 
for annotated programs was faster than the unannotated counterparts. This 
would be because having more static information made it easier for GRATEN to 
infer more precise types and resolve more subsumption constraints easily. 


5.3 Discussions 


In this subsection, we discuss the strengths, weaknesses and our perspective on 
the future development of our system. 


Performance of Best-Effort Inference As reported in the previous sub- 
section, the best-effort inference of GRATEN does not infer precise types for 
branches, recursions and higher-order shape-polymorphic arguments. While this 
may seem unsatisfying at a glance, the aim of this research is not to develop a 
perfect inference algorithm, but to propose a method that can work on unanno- 
tated programs and allows users to work interactively with the type checker to 
gradually add type annotations. With this respect, we believe that GRATEN has 
achieved desirable results since it will be easy for the user to find out where to 
add type annotations. This is because (1) the inserted assertions can inform the 
user of the location of potential dynamic errors, and (2) all of the required type 
annotations would fall into one of the patterns listed in the previous section and 
thus should be predictable. 


Lists of Tensors and Layers As of now, the refinement inference for lists 
in GRATEN is limited to integer lists. Meanwhile, lists of tensors or lists 
of functions are commonly used in deep learning programs: Tensor.cat and 
Tensor.stack both take a list of tensors and return their concatenation, and 
Layer. sequential takes a list of layers (functions that take and return a tensor) 
and returns their composition. 

A potential approach to support these library functions would be to add new 
refinement predicates for tensors lists or layer lists. For example, we can add a 
predicate composable(z, S1, S2) which means that the composition of a list of 
layers x takes a tensor of shape Sı and returns a tensor of shape S2. The type of 
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Layer.sequential would be expressed with the shape polymorphic extension 
(see the full version [13]) as follows. 


val sequential : forall S1 82. 
{ v:list(tensor -> tensor) | composable(x,S1,S2) } 
-> tensor(S1) -> tensor (S2) 


To practically infer composable predicate for layer lists, we would need to change 
the type-instantiated versions of list-manipulating functions as well. For instance, 
the type of the cons function for layers would need to be defined as follows. 


val cons_layers 
forall S1 S2 $3. (tensor(S1) -> tensor ($2)) 
-> { v:list(tensor -> tensor) | composable(v, S2, S3) } 
-> { v:list(tensor -> tensor) | composable(v, S1, S3) } 


Reporting Incorrect Type Annotations Since our type system sees the 
standard refinement types as gradual, some users might find the behavior of 
GRATEN unexpected in some cases. Consider the following function f which 
takes a matrix and returns a matrix obtained by transposing the input. Suppose 
that the programmer mistakenly annotated the return value of f to have the 
same shape as the input matrix. 


let f x = (tr x : tensor(x.shape)) 


Although this type annotation does not hold in general, this program is not 
rejected by our type system because the annotation can hold if the input x is a 
square matrix. GRATEN would output the following program with an assertion. 


let f x = (fun y -> assert(y.shape = x.shape); y) (tr x) 


To avoid such a situation, it would be possible to extend the type system with 
types with fully statically known refinements, and let the annotated types be 
interpreted as such. 


6 Related Work 


Tensor Shape Checking in Deep Learning Programs. The problem of 
tensor shape checking has been studied for decades by various contexts such as 
the numeric analysis [7,2] and the array-oriented languages with rank polymor- 
phism [29,28,12]. Tensor shape checking for deep learning programs is still a new 
challenge because the shapes can be more complicated, and a variety of methods 
have been proposed both in academia and in industry. 

Some tools statically check tensor shapes with advanced type systems. Hask- 
torch [3] is a Haskell binding of libtorch [20] which provides a mode that stati- 
cally checks tensor shapes. Since they use the type-level programming feature of 
Haskell to implement the tensor shapes, tensor shapes are not first-class objects. 
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As a result, programs such as the one in Figure 1 cannot be expressed since it is 
impossible to define the function f whose type depends on the first-class object 
s. Relay [25,24] is an IR for deep learning compilers with a rich type system for 
tensor shape with type inference. Both Relay and Hasktorch support dynamic 
shape as a wild card in the static shape checking. 

Apart from the type-based verification methods, some tensor shape error de- 
tection tools also take a static approach. Pythia [17,6] statically detects shape 
fault for TensorFlow [1] programs by keeping track of the tensor shapes through- 
out the program using value-flow analysis. The tracking of shape is in a best- 
effort manner, allowing the shape inference results to be “unknown” in some 
cases. The analysis crucially relies on the programming practice in TensorFlow 
to annotate tensor shapes as much as possible. 

Other static checking tools took an approach that uses symbolic execution to 
collect constraints from the program and verifies it with a solver; Tensors Fitting 
Perfectly [21] and PyTea [15] are on this approach. Both methods remove loops 
from the program in an ad-hoc manner based on a reasonable assumption for 
the program. 

Lastly, some took dynamic approaches to provide lightweight shape fault de- 
tection. ShapeFlow [31] is an abstract interpreter of TensorFlow programs; it 
shares the same APIs as TensorFlow but only calculates the shape of tensors. 
Users can run the analysis by replacing the import of TensorFlow with Shape- 
Flow in the target program, which executes more efficiently than the original 
TensorFlow program. Elichika [14] uses a similar method to ShapeFlow with 
a feature to display the interpreted shapes with a symbolic expression. These 
dynamic approaches enable quick analysis and require no type annotations, but 
provide no guarantee for untested inputs. 


Static and Dynamic Checking for Refinement Types. Earlier work on 
dependent type system focused on decidable type checking and inference with 
restricted refinement logic [10,34,33,26]. Dynamic checking with contracts [19,9] 
offers expressive verification that cannot be covered with a static type system, 
but at a cost of runtime overhead. Naturally, the combination of static and 
dynamic checking has been actively explored by the successors of both parties. 

Hybrid type checking [16], which our work is based on, extends the purely- 
dynamic method of using contracts by verifying specifications statically as much 
as possible. This method differs from ours in that it inserts a dynamic check 
only when the subtyping constraint is not proven to be valid or invalid. As a 
result, this method statically rejects the incorrectly annotated program that we 
discussed in Subsection 5.3, while our method accepts it with a dynamic check in 
the hope that a more precise type annotation will remove the need for a dynamic 
check. Our method can be understood as a variant of hybrid type checking with 
a focus on being gradual in adding type annotations. 

The application of gradual typing to dependent type systems has also been 
studied [18,8]. Especially, gradual refinement types [18] is very similar to our type 
system in that it gradualizes only the predicate part of a refinement type system 
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and the underlying simple type is static. One of the differences is that their 
system distinguishes statically-unknown refinement predicates with statically- 
known ones, while our system assumes that any refinement predicates can have 
a statically-unknown portion. For example, consider the following program: 
let fa(y:{v: int | true}) = 2/y 

This program is rejected in their system because the type annotation of y in- 
dicates that the programmer is confident that y can be any integers including 
0; otherwise, the type annotation should have been {v : int | x}. Meanwhile, 
our system interprets the type annotation as not precise enough and accepts the 
program by inserting a dynamic check to y. Intuitively, {a : B | p} in our type 
system translates to {x : B | y A x} in gradual refinement types [18]. 

The type inference for gradual refinement types has been studied by Vazou et 
al. [30]. Their work restricts the refinement to liquid predicates [26] to maintain 
the decidability, while our work does not impose such a limitation. 


7 Conclusion and Future Work 


We presented an extension to the standard refinement type system which can be 
viewed as a gradual type system. The essence of this extension is the introduction 
of the consistent subtyping relation, which inserts to the source program asser- 
tions that checks statically-unverified properties at runtime. We also presented 
that the extended type system satisfies the refined criteria of gradual typing. 

We then applied this type system for verifying tensor shapes with best-effort 
type inference. This application makes use of the property of the proposed type 
system that allows us to cover the limitation of the static best-effort analysis 
with dynamic checks. We also implemented a prototype type checker GRATEN 
and applied it with some of the example programs publicly available in OCaml- 
Torch repository. We observed that, thanks to the best-effort type inference, 
users would not be required too many type annotations to statically type-check 
the whole program, and it would not be difficult to find where to add type 
annotations to improve the inference. 

We conclude with some ideas for future work. 

— Extension with type polymorphism. As we observed in the experiments, 
type polymorphic functions are frequently used in realistic programs. Extending 
our type system with ML-style type polymorphism would make the type checker 
more practical. 

— Application for imperative languages with a dynamic type system, like 
Python. In this paper, we have chosen OCaml as the target of the prototype to 
ensure that the input program is statically-typed. Python would, however, be a 
more attractive target since it is widely used in the machine learning community. 
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Abstract. We consider a simple yet expressive \-calculus equipped with 
references, effect handlers, and dynamic allocation of effect labels, and 
whose operational semantics does not involve coercions or rely on type in- 
formation. We equip this language with a type system that supports type 
and effect polymorphism, allows reordering row entries and extending a 
row with new entries, and supports (but is not restricted to) lexically 
scoped handlers. This requires addressing the issue of potential aliasing 
between effect names. Our original solution is to interpret a row not only 
as a permission to perform certain effects but also as a disjointness re- 
quirement bearing on effect names. The type system guarantees strong 
type soundness: a well-typed program cannot crash or perform an un- 
handled effect. We prove this fact by encoding the type system into a 
novel Separation Logic for effect handlers, which we build on top of Iris. 
Our results are formalized in Coq. 


1 Introduction 


Effect handlers [30,17] can be viewed as a generalization of exception handlers. 
Like raising an exception, performing an effect interrupts the normal flow of 
execution and transfers control to a handler. Unlike an exception handler, an 
effect handler gains access to a delimited continuation, which represents the 
fragment of the evaluation context comprised between the point where the effect 
was performed and the point where the effect handler was installed. Invoking 
this continuation resumes the computation whose execution was suspended by 
performing an effect. 

To allow programmers to exploit several independent effects simultaneously, 
it is desirable for effects to have names. Each effect handler handles a specific 
name, or a specific set of names. When an effect is performed, the name of this 
effect determines which handler is selected. This idea immediately gives rise to 
several key questions about names. What are they: strings, variables, addresses? 
Where are they defined? What is their scope? 

In the simplest approach [2,14,22], effect names are global. All possible names 
are predefined and are in scope everywhere. This approach is simple but unsatis- 
factory in terms of expressiveness and modularity: an accidental collision, where 
two unrelated pieces of code happen to use the same effect name, can have 
surprising unintended consequences. We illustrate this problem later on (§2). 
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To remedy this problem, several authors have proposed to change the nature 
of names. Their work falls broadly in two categories: the “lexical approach” and 
the “generative approach”. 

The “lexical approach” introduces local effect names with lexical scope. One 
can then think of an effect name essentially as a variable. Tunneled exceptions |42] 
and lexically scoped handlers [41,6,7,27| fall in this approach. In some of these 
proposals, the local effect name is never exposed to the user, but a “capability” to 
perform the effect is made available via a local variable. A potential pain point of 
this approach is that one must somehow ensure that a name or capability cannot 
escape its scope: this must be guaranteed by some combination of syntactic 
restrictions, runtime tests, and static typing rules. 

The “generative approach” consists in allowing new effects to be generated 
afresh at runtime. This requires introducing a distinction between effect labels, 
which are allocated at runtime, and effect names, which are variables (with 
lexical scope) that the programmer uses to refer to effect labels. This is similar 
to the distinction between memory locations and variables that is traditionally 
used in the operational semantics of mutable references [29]. This approach has 
long been in use for exceptions in Standard ML [25] and OCaml [24], and is 
used also for effects in OCaml 5. It is powerful: in particular, it can simulate 
lexically scoped handlers.' However, it introduces several pitfalls of its own. 
First, it creates the possibility of nameless effects, that is, the possibility that 
there is no static effect name for a certain effect label. Second, it introduces 
the possibility of aliasing between effect names, that is, the possibility that two 
distinct effect names denote the same effect label. Aliasing creates a challenge 
for type system designers: if one cannot statically tell whether two effect names 
denote distinct labels, then it seems unclear how one can propose a sound and 
precise type discipline. 

At least three ways of evading or addressing this challenge appear in the 
literature. 

First, several mainstream languages adopt the generative approach but avoid 
the aliasing challenge by offering a weak type soundness guarantee: a well-typed 
program cannot crash, but can halt due to an unhandled exception or effect. 
This is the case in Standard ML, where exceptions are untracked, and in OCaml, 
where exceptions and effects are untracked. It is also the case in Eff [3]. 

Second, a number of authors evade or resolve the aliasing challenge by altering 
the syntax and the operational semantics of the language. Instead of letting 
the correspondence between an effect and a handler be determined purely by 
the notion of equality of effect labels or effect names, they introduce coercions 


1 This can be a source of confusion. A language that has “lexically scoped handlers” 
can, technically, be presented in either of these two styles. Biernacki et al. [6] present 
one semantics in each style, the “open semantics” and the “generative semantics”, and 
prove an equivalence between them. Zhang and Myers [41] adopt what we believe is 
a combination of lexically scoped handlers and implicit arguments, which they refer 
to as “tunneling”, in their surface language. This language is then translated down 
to a core language whose operational semantics is in the generative style. 
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that enable explicit disambiguation and collision avoidance. Examples include 
Koka [21] as well as several papers by Biernacki et al. [4,5]. 

Third, some authors evade the challenge by restricting the programming 
language in one or more ways, such as restricting attention to lexically scoped 
handlers [6,7] and forbidding first-class functions [7]. 

This sets the scene for this paper. We stick with the generative approach, 
which offers a simple and expressive semantics. We do not introduce coercions 
or otherwise alter the operational semantics. We do not restrict our attention to 
lexically scoped handlers. We address the aliasing challenge. 

We propose TES, a type-and-effect system that statically rules out unhandled 
effects. As in most previous work, the potential effects of an expression are de- 
scribed by a row, a concept introduced to type-check records and variants [32,38] 
and later applied to the analysis of exceptions [28] and effects [14,22]. Type and 
effect polymorphism are supported. Furthermore, a simple and powerful sub- 
sumption relation allows reordering the entries in a row and extending a row 
with new entries, without any side conditions. 

How is this possible? How is the aliasing challenge addressed? Our key idea 
is this: whenever a question about aliasing arises, require absence of aliasing. 
In other words, we interpret a row not just as a description of the names and 
types of the effects that may be performed, but also as a requirement that these 
names be pairwise distinct. For instance, if a typing judgment states that an 
expression e has effect (s : 1 => K) - (s' : 0’ > «’), then this means not only that 
e may perform the effects s and s’, but also that e requires the effect labels 
denoted by s and s’ to be distinct. In the presence of effect polymorphism, if e 
has effect (s :ı = &)- 0, where 0 is a row variable, then we take this to mean 
that e requires the effect label denoted by s to lie outside the set of effect labels 
denoted by 0. We adapt our typing and subtyping rules, where needed, so as to 
be sound with respect to this new interpretation of rows. 

The reader may find our approach somewhat reminiscent of the manner in 
which the separating conjunction of Separation Logic [31] requires disjointness 
between the footprints of two formulae. Although this requirement may at first 
seem strong, experience has shown that Separation Logic is in fact concise and 
expressive. The examples that we present in Section 4.4 seem to suggest that our 
disjointness requirement is acceptable; we have not yet found examples where 
it is problematic. That said, we do not yet have practical experience with an 
implementation of this type system. 

TES offers a strong type soundness guarantee: a well-typed program cannot 
crash and cannot halt due to an unhandled effect. To prove this fact, we follow 
a semantic approach that has become popular in the last few years [1,20,19]. We 
introduce TESLOGIC, a novel variant of Separation Logic, constructed on top of 
Iris [16], which allows reasoning about programs in the presence of effects and 
handlers, multi-shot continuations, and dynamic allocation of effect labels. We 
prove that this logic is sound, and we provide an interpretation of TES’s typing 
rules in terms of TESLOGIC’s reasoning rules. All of our results are formalized 
in Coq, and our Coq formalization is available [36]. 
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In summary, the main contributions of this paper are the design of TES, 
a type system for TESLANG, a A-calculus equipped with general references, effect 
handlers, and dynamic allocation of effect labels, and a proof of type soundness, 
which is carried out via a semantic interpretation into a new program logic, 
TESLOGIC. 

In Section 2, we provide more background and examples about the semantics 
of effect handling: we discuss name collisions, effect coercions, lexically scoped 
handlers, and dynamic allocation of effect labels, and we justify why we wish 
to study a calculus where effect handling and dynamic allocation of effect labels 
are separate constructs. In Section 3, we present the syntax and operational 
semantics of TESLANG. In Section 4, we introduce TES and show a number 
of examples of constructions that TES is able to type-check. In Section 5, we 
present a brief overview of the proof of type soundness. Finally, we discuss the 
related work and conclude. 


2 A Panorama of Semantics for Effect Handlers 


The various mechanisms that we have mentioned so far, namely lexically scoped 
handlers, dynamic allocation of effect labels, and effect coercions, aim to resolve 
the basic problem of accidental collisions between effect names. Let us illustrate 
this problem with an example. 

Anticipating on Section 3, we use a A-calculus equipped with constructs to 
perform and handle effects. The expression perform s v performs an effect with 
effect name s and payload v. The expression handle e with s : h | r installs an 
effect handler which monitors the execution of the subexpression e and which 
handles the effects that carry the name s.? If e returns a value v, then the return 
branch r is invoked and receives the value v as an argument. If e performs an 
effect with name s and with payload v, then the execution of e is suspended and 
control is transferred to the effect branch h, which receives the payload v and a 
continuation k representing the suspended computation. 

Let us now introduce the function bad_counter. In a system of simple types, 
which does not keep track of effects, bad_counter expects a function ff of type 
(a > 6) > y and returns a function of type (a > 8) > y x int. The intended 
behavior of bad_counter ff is to produce a new function ff’ such that ff’ behaves 
like ff but at the same time counts how many times ff uses its argument. That 
is, for an arbitrary function f, the application ff’ f is expected to return a 
pair (v,n), where v is the result of the computation ff f and n is the number 
of invocations of f that have taken place during this computation. The function 
bad_counter is defined as follows: 


7 handle ff (Ax. perform tick (); f x) with 
pad gountan =A & :A_k. An. k () (n+ 1) | Ay. An. (y,n) 


This code has a free effect name, tick. The function f is wrapped in a proxy 
which performs an effect named tick. This effect is handled by bad_counter; the 


2 For simplicity, this construct selects just one name, as opposed to a set of names. 
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handler implements a memory cell (in state-passing style) to count the number 
of ticks, that is, the number of calls made by ff to f. 

Unfortunately, because this function uses a fixed effect name, tick, it can 
exhibit an unintended behavior, caused by an accidental collision of effect names. 
The following use of bad_counter exhibits this issue: 


bad_counter (bad_counter (Af. f OD (A_.()) 


Because the function Af. f () calls its argument once, one might expect the 
above expression to return (((), 1), 1). Its actual result, however, is (((),2),0). In 
the interest of space, we omit an explanation of its operational behavior. The 
key reason why it behaves incorrectly is that the two instances of bad_counter 
use the same effect name. Each application of bad_counter installs a handler for 
the effect name tick. One handler is nested inside the other. As a result, the 
innermost handler intercepts two tick effects and the outermost handler never 
observes any effect, whereas what was naively intended was that each handler 
observes and handles one effect. As a result of the name collision, one of the 
effects is accidentally handled by the innermost handler. 

To avoid or help avoid accidental collisions between names, the literature 
describes several mechanisms: (1) effect coercions, (2) lexically scoped handlers, 
which can be viewed as a restricted case of (3) dynamic allocation of effect labels. 
Let us now say a little more about these mechanisms. 


Effect coercions. An effect coercion modifies the manner in which an effect is 
matched with one of the enclosing handlers. Perhaps the simplest example is that 
of the lift coercion [4,5], but there are other forms of coercions in the literature, 
such as swap. Normally, performing an effect named s transfers control to the 
innermost enclosing handler that selects the name s. However, in a language 
with effect coercions, if there is a lift coercion between the point where the 
effect is performed and the innermost enclosing handler, then this handler is 
skipped and control is transferred instead to the next enclosing handler for the 
name s.° Under such a semantics, a coercion can be employed to write a fixed 
version of bad_counter: 


lift_counter ff = 
f handle ff (Ax. perform tick (); lift tick (f x)) with 0 
` Atick : A_k. An. k () (n +1) | Ay. An. (y, 2) 


As desired, lift_counter (lift_counter (Af. f ())) (A_. ()) returns the 
value (((),1),1). One tick effect is intercepted by the innermost handler; the 
other effect is intercepted by the outermost handler thanks to the lift coercion. 
In Biernacki et al.’s AHEL [5], Lift_counter is well-typed. The Lift coercion is 
mandatory; without it, the code would be ill-typed. 


3 A lift coercion behaves like an end-of-scope marker for the name s. This concept 
has been studied, independently of effects, by various authors [13,10]. 
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Lexically scoped handlers and dynamic allocation of effect labels. Perhaps the 
most straightforward way to describe the operational behavior of lexically scoped 
handlers is by means of their encoding in terms of ordinary effect handlers and 
dynamic generation of effect labels. So, let us first extend our calculus with 
dynamic allocation of effect labels. We introduce the construct effect s in e, 
which binds the effect name s to a freshly generated effect label, then executes e. 
The effect name s is a local variable: its scope is the subexpression e. An effect 
label is a runtime entity; later in the paper, we let £ range over effect labels. In 
this setting, a “lexically scoped handler” is encoded (simulated) as follows: 


lex-handle e with h | r = (1) 
effect s in handle e (Ax. performs x) withs : h|r 


This code first generates a fresh effect label, denoted by the name s. Then, it 
installs a handler for the name s. This handler monitors the execution of the 
expression e to the anonymous function Ax. perform s x, which can be viewed as 
a “capability” to perform the effect s. 

A noteworthy aspect of the syntactic sugar lex-handle e with h | r is that it 
does not explicitly involve any effect name. This construct is known as a “lexically 
scoped handler”. 

A lexically scoped handler can be used to write a fixed version of bad_counter: 


M lex-handle Atick. ff (Ax. tick (); f x) with 
counsel Sr ta 6 OGD (Oa ) 0 @) 


When lex-handle is executed, a fresh effect label (which is never explicitly 
mentioned in this code) is generated. The variable tick stands for the “capabil- 
ity” to perform this fresh nameless effect. One can check that the expression 
counter (counter (Af. f ())) (A_. ()) reduces to the value (((), 1), 1), as desired, 
because the two instances of counter generate two distinct dynamic labels and 
install one handler for each of these labels. Thus, no collision takes place. 


Arguments in favor of dynamic allocation of effect labels. In summary, dynamic 
allocation of effect labels is a way of avoiding collisions between effect names. It 
can express lexically scoped handlers, but does not impose the use of lexically 
scoped handlers: it also allows working with global names when desired. Its 
dynamic semantics is simple. It is in use in several established programming 
languages, such as Standard ML and OCaml. 

We believe that lexically scoped handlers are an elegant idiom, which is well 
suited to many but not all situations. So, we would not be satisfied with a 
restricted programming language where lexically scoped handlers are the sole 
form of effect handling. Indeed, lexically scoped handlers impose a somewhat 
unnatural “capability-passing” style, where the capability to perform an effect 
must be passed as an argument to a function (or captured in its closure). This 
style becomes especially cumbersome when multiple effects are involved. Implicit 
arguments can help, as suggested by Zhang and Myers [41] and by Odersky et 
al. [27]. However, elaboration of implicit arguments is usually a type-directed 
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ni=s |e 
vu=()| L | rec f z. e | §K 
eu=vul|alee|refe|!e|e=e 


| effect s in e | perform n e | handle e with n : v | v | eff £v K 
Ku=eleK|KvirefK|!Kl|e=K|K=v 
| perform £ K | handle K with £: v|v 


Fig. 1. Syntax of effect values, values, expressions, and evaluation contexts 


effect s in e / o > eļl/s] / oll => ()] 
perform lv /o > eff lv e/a 
handle v with l: h|r/o—>rv/o 
handle (eff Lv K) with £: h|r /o —> hv §(handle K with€:h|r)/o 
SK v/o > Klo] /o 


(eff Lvi K) ve / o > eff £u1 (K v2) / o 
e1 (eff v2 K) / o > eff L v2 (e1 K) / o 
handle (eff lv K) with l : h|r /o— eff £v (handle K with l : h|r)/o 


Fig. 2. The head reduction relation (selected rules) 


translation. If at all possible, we wish to preserve the “type erasure” property: 
that is, we prefer a language whose operational semantics is not influenced by 
type information, because such a semantics is easier to explain to an end user. 
Similarly, we wish to avoid effect coercions because we believe that they introduce 
unwarranted complexity, making the language and its dynamic semantics more 
difficult to explain to programmers. 


3 Syntax and Semantics 


We introduce TESLANG, a calculus with mutable state, effect handlers, multiple 
named effects, dynamic allocation of effect labels, and multi-shot continuations. 
The operational semantics of this calculus allows a continuation to be invoked 
several times. With respect to this semantics, the type system presented in this 
paper (§4) is strongly sound: it rules out all runtime errors (§5). With respect 
to a dynamic semantics where invoking a continuation twice causes a runtime 
failure, such as the semantics of OCaml 5, our type system would be weakly 
sound, because it does not rule out this kind of runtime failure. Ensuring that 
every continuation is invoked at most once would require an affine type system 
and is beyond the scope of this paper. We note that an affine program logic, such 
as Hazel [35], can guarantee that no continuation is invoked twice, therefore can 
guarantee strong soundness even in the presence of one-shot continuations. 
Our small-step operational semantics is very straightforward. It is equipped 
with dynamic allocation of effect labels and with a standard treatment of effects 
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and effect handlers [2]. When an effect with label ¢ is performed, a dynamic 
lookup takes place: the nearest enclosing handler that is able to handle the 
label £ is selected. This is expressed, in small-step style, via several reduction 
rules. In contrast with some papers in the literature, where coercions influence 
the process of selecting a handler [21,4,5], here, this process is based purely on 
equality of effect labels. 


3.1 Syntax 


We let f and x range over an infinite set of variables. We let s range over an 
infinite set of variables, and we refer to these variables as effect names. These two 
namespaces are independent of one another: an effect name cannot be passed 
as a parameter to a function. We let £ range over an infinite set of addresses. 
These addresses model both memory locations and effect labels. Both kinds of 
entities are dynamically allocated, so, for simplicity, we use a single namespace 
of addresses and a single store. Whereas variables f,x and effect names s can 
appear in source programs, memory locations and effect labels @ exist only at 
runtime. The reduction rules of the small-step semantics cause them to appear. 

The syntax of effect values, values, expressions, and evaluation contexts is 
shown in Figure 1. 

An effect value n is either an effect name s or an effect label £. This syntactic 
category is closed under substitutions of effect labels for effect names. It is used 
in the constructs perform n e and handle e with n : v | v. A programmer always 
writes perform s e and handle e with s : v | v, where s is an effect name, but the 
more general form is required in the operational semantics. 

A value v is the unit value (), a memory location @, a possibly recursive 
function rec f x. e, or a continuation §K. 

The syntax of expressions e includes values, variables, function application, 
operations for allocating, reading, and writing references, as well as constructs 
for allocating a fresh effect label, performing an effect, and handling an effect. 
Sequencing is encoded as function application: let x = e in e2 is sugar for 
(Ax. e2) e1. The construct effect s in e dynamically allocates a new effect 
label and binds the effect name s to this label in the expression e. The con- 
struct perform s v performs an effect whose name is s and whose payload is 
the value v. The construct handle e with s : h | r monitors the execution of the 
expression e. If an effect named s is performed, then the effect branch h takes 
control. If a value is returned, then the return branch r takes control. An ef- 
fect that carries a name other than s is propagated up through this construct. 
Finally, the construct eff l v K, an active effect, does not appear in source 
program, but plays a role in the operational semantics, as we shall explain in 
the next subsection. 

Our Coq formalization [36] covers a richer calculus, whose features include 
base types, pairs, sums, and lists. 

The syntax of evaluation contexts K defines a right-to-left evaluation order. 
This choice is arbitrary: it is inspired by Iris’s HeapLang language [33], but our 
results would hold also with left-to-right evaluation. 
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3.2 Semantics 


The operational semantics of TTESLANG involves two relations, namely the head 
reduction relation e / o > e' / o’ and the reduction relation e / o —> e' / o. 
They act on configurations, where a configuration e / ø is a pair of an expres- 
sion e and a store a. The head reduction relation, a fragment of whose definition 
appears in Figure 2, is the most interesting relation. The reduction relation, 
whose definition is omitted, allows one step of head reduction to take place 
under an evaluation context. 

A store is a finite map of addresses to values. We use addresses £ to denote 
both memory locations and effect labels. If 2 denotes a memory location (that 
is, the address of a reference), then o(¢) is the value stored at this address. If Z 
denotes an effect label, then the value o(£) is irrelevant: by convention, we use 
the unit value (). 

The rules not shown in Figure 2, such as (6,-reduction and the rules for 
allocating, reading, and writing references, are standard. 

The first rule in Figure 2 states that effect s in e allocates a fresh address £, 
extends the store with a mapping of £ to the unit value, and substitutes the effect 
label £ for the effect name s in the expression e. (The rule has the side condition 
L ¢ dom ø.) According to the second reduction rule, perform £ v reduces to an 
active effect eff l v e. An active effect has the ability to capture the surrounding 
evaluation context, until it reaches a handler that is able to handle it. In this 
rule, it is initialized with an empty evaluation context e. The last three rules in 
Figure 2 show how an active effect captures its evaluation context, one frame 
at a time. (The last rule has the side condition £ # ¢’.) The third and fourth 
rules in Figure 2 show how the return branch or the effect branch of a handle 
construct are taken. In the latter rule, the handler h is applied to the payload 
value v and to a continuation, which reifies the captured evaluation context K. 
The continuation contains a copy of the effect handler: this is a deep-handler 
semantics [15]. The fifth reduction rule in Figure 2 describes the application of 
a continuation §K to a value v. 


4 Type System 


4.1 Syntax of types, rows, and signatures 


We let a, 8, and y range over an infinite set of type variables. We let 0 range 
over an infinite set of row variables. We distinguish three syntactic categories, 
namely types, rows, and signatures (Figure 3). The syntax of types is stable under 
substitutions of types T for type variables a. The syntax of rows is stable under 
substitutions of rows p for row variables 0, for an ad hoc notion of substitution, 
which reduces row concatenation expressions “p : p’” on the fly.4 


4 The distinction between rows and signatures enforces the view that a row p is a list 
where each component (known as a “signature”) is either a signature for an effect 
name s or a row variable 9. Thus, we impose a simple form on rows. As an alternate 
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Fig. 4. The type system (selected rules) 
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Our types are standard: they include the unit type unit, the bottom and top 
types L and T, type variables a, reference types, effect-annotated arrow types, 
value-polymorphic types, and effect-polymorphic types. Effect-annotated arrow 
types and effect-polymorphic types are discussed below. 

A row is a list of signatures o. A signature, in turn, is either a singleton 
signature s:v' = K' or a row variable 0. A singleton signature s : 1’ = K’ means 
that performing the effect s is permitted and is analogous to calling a function 
of argument type v’ and return type x’. According to this reading, a singleton 
signature of the form s : L = T actually forbids the effect s, because a function 
whose argument type is | can never be called. We write s : abs as a short-hand 
for this signature, and we refer to it as an absence signature for the effect s. 

In addition to an argument type 7 and a return type «, an arrow type T 2k 
carries an “effect”, that is, a row p. Intuitively, a value of type T £; kisa function, 
which, when applied to an argument of type 7, either returns a result of type « or 
performs an effect that is permitted by the row p. On top of this standard reading 
of effect annotations, TES introduces a novel aspect. The effect annotation p is 
interpreted not only as a set of permitted effects, but also as a precondition: we 
impose the semantic requirement that a function of type T 2; k can be invoked 
only if the multiset of effect labels denoted by the row p has no duplicate elements. 
This is not a syntactic requirement, which would be either “true” or “false” and 
would be decided just by inspecting the syntax of the row p. Indeed, in general, 
a row contains occurrences of effect names s, which denote a-priori-unknown 
effect labels, and of row variables 0, which denote a-priori-unknown multisets of 
effect labels. What we wish to require is that, at runtime, after effect names and 
row variables have been substituted away by some substitution 7, a function of 
type T 2; k can be invoked only if no effect label appears twice in the closed 
row 7(p). Thus, the requirement that “p contains no duplicate labels” should be 
thought of as a disjointness hypothesis bearing on the row p. Such a hypothesis 
may or may not be satisfied, depending on how the effect names and row variables 
that occur in p are instantiated. 

In TES, disjointness hypotheses are sometimes explicit and most of the time 
implicit. In the subsumption judgments (Figure 5), a disjointness context D is 
explicit: it can be interpreted as a conjunction of disjointness hypotheses. In 
function types r & « and in typing judgments Z | A| [+ e:p: 7, an 
implicit disjointness hypothesis bearing on the row p is built in, so there is no 
need for an explicit disjointness context. 

An effect-polymorphic type V@. T involves a universal quantification over a 
row variable 0. For instance, the function iter, which iterates over a list, can be 
defined as follows: 


iter = rec iter xs f. match zs with (Av as. f x; iter zs f | A_.()) (3) 


path, one could use a single syntactic category p ::= () | p-p | (s:t > 7) | 9, 
where a more general form of row concatenation is allowed. This would allow using 
a standard notion of substitution, and would lead to different statements for some 
of the row subsumption rules. 
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This function admits the following value- and effect-polymorphic type: 


6 iy OO, 
iter: Va. VO. a list > (a > unit) > unit 


This type states that the call iter zs f is safe, regardless of what the elements 
of the list xs might be, and regardless of what effects the user function f might 
perform. This type also guarantees that iter does not perform any effect of 
its own: instantiating 0 with () shows that this must be the case. Finally, one 
might think that this type guarantees that iter cannot intercept the effects 
performed by f. This may or may not be true, depending on which interpretation 
of effect-polymorphic types is chosen. A stronger interpretation can guarantee 
this property, but rules out certain useful programming language constructs, such 
as “dynamic-wind”. Conversely, a weaker interpretation of effect-polymorphic 
types allows type-checking “dynamic-wind”, but breaks this guarantee. At this 
time, the interpretation that we have verified in Coq is the weaker one (§5). We 
further discuss this point in Section 6. 


4.2 The typing judgment 


A typing judgment in TEs takes the form = | A| It e: p: 7T. It involves 
three environments: a row- and type-variable context =, which binds row and 
type variables 0 and a; an effect-name context A, which binds effect names s; and 
a type environment I’, which maps variables x to types T. This typing judgment 
states that the expression e has effect p and type 7. Like an arrow type, this 
judgment involves an implicit disjointness hypothesis bearing on the row p. That 
is, this judgment guarantees that it is safe to execute e provided the row variables 
and type variables in = are instantiated in such a way that the multiset of effect 
labels denoted by p has no duplicate elements. 

A selection of the typing rules appears in Figure 4. The typing rules for 
variables, functions, and applications are the same as in most type-and-effect 
systems. The typing rules for references are also standard, and are omitted. 
The rules TYPEINTRO, TYPEELIM, ROWINTRO, ROWELIM, which introduce and 
eliminate value- and effect-polymorphic types, are also standard. In the presence 
of mutable state, an unrestricted introduction rule for polymorphic types is un- 
sound [34]. In this paper, we avoid this problem simply by building the value 
restriction [39,12] into TyPEINTRO and ROWINTRO. Our Coq formalization [36] 
proposes a more elaborate approach, where function types and typing judgments 
are annotated with purity attributes. This approach yields a slightly more expres- 
sive system, where, in particular, perform s x is considered a pure expression, 
therefore can receive a polymorphic type. 

Rule EFFECT, read from bottom to top, changes the current effect from p to 
(s : abs)-p. Intuitively, this means several things. First, while type-checking e, it 
is safe to assume that the effect label denoted by s is disjoint from the multiset 
of effect labels denoted by p. This assumption is implicitly expressed by the mere 
appearance of the row (s : abs) - p in the premise. This assumption is justified 
indeed, since the effect name s is bound to a fresh effect label when effect s ine 
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is executed. Second, because of the absence signature s: abs, one must check 
that the expression e does not perform any effect with the name s. This seems 
a natural and unavoidable restriction: if such an effect was allowed, there would 
be no static effect name by which it can be described. Third, because of the 
side condition s ¢ p, one must check that the row that appears in the premise 
contains at most one singleton signature for the effect name s. As a counter- 
example, if the expression e has effect (s : abs) - (s : abs), then the typing rule 
EFFECT cannot be applied. The subsumption rule SUB cannot help, because the 
subsumption judgment (s : abs) -(s:abs) < (s : abs) does not hold. Thus, the 
rule EFFECT enforces a disjointness constraint. 

Rule PERFORM states that, when one performs an effect whose signature 
is s:4 => k, one must pass a payload value of type v, and, in return, one can 
expect a value of type «K. This supports the intuitive idea that performing an 
effect is analogous to calling an effect-free function of type 1 > k. 

Rule HANDLE type-checks handle e with s : h |r, where the expression e 
is monitored by a handler for the effect s. This rule expresses the idea that 
this construct establishes a boundary between the inside, where effects named s 
may be performed in accord with the signature s : i > k, and the outside, where 
effects named s may be performed in accord with a different signature s : 1’ > K’. 
Because s: abs is sugar for s: L = T, this rule also covers the common case 
where the effect s is absent on the outside. Both the effect branch h and the 
return branch r are part of the “outside world”, so their effects are described 
by the outside row p’. This remark explains all occurrences of p’ in the last two 
premises, except the one in the type of the continuation. The continuation, which 


is the second parameter of the effect branch h, has type « £; 7’, Because we 
have adopted a “deep-handler” semantics (§3), a copy of the handler is reinstalled 
inside the continuation. This explains why the effect p’ and the result type 7’ of 
the continuation are the same as those of the whole handle construct. 

Rule SUB weakens a typing judgment by replacing an effect p and a type T 
with a weaker effect p’ and a weaker type 7’. This rule relies on several sub- 
sumption judgments, which we discuss next. 


4.3 The subsumption judgments 


The subsumption judgments on types, signatures, and rows appear in Figure 5. 
An original aspect is that these judgments depend on a disjointness context D, 
which appears on the left of the turnstile. A disjointness context is a (possibly 
empty, unordered) list of rows, and is interpreted as a conjunction of disjointness 
hypotheses: one hypothesis bears on each row. For instance, the disjointness 
context (sı : t1 = K1)- (S2:l2 = k2), (83:43 = k3): 0, which is a list of 
two rows, is equivalent to a conjunction of two disjointness hypotheses. The 
first hypothesis is equivalent to sı Æ s2: it represents the assumption that the 
effect names sı and sz denote two distinct effect labels. The second hypothesis 
expresses the assumption that the effect label denoted by s3 is not a member of 
the multiset of effect labels denoted by 0 and that this multiset has no duplicate 
elements. 
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In the subsumption rules, the disjointness context is extended in the rule 
ARROW and exploited in the rule ERASE. Elsewhere, it is just transported. 


Subsumption on types. The subsumption judgment on types D F T <r T’ means 
that, under the hypothesis D, 7 is a subtype of 7’. The rules in Figure 5 state 
that this relation is reflexive, transitive, and admits L and T as bottom and 
top elements. On function types, as usual, subsumption is contravariant in the 
domain and covariant in the effect and in the codomain. One original aspect 
of ARROW is that this rule enriches the disjointness context: in the premises, the 
disjointness context changes from D to D, p’. The intuitive reason why this is 


sound is that if someone uses a function at type T’ & x’ then (at the point where 
the function is used) the disjointness hypothesis p’ must be satisfied, because this 
hypothesis is part of our interpretation of function types. Thus, when proving 


that a function of type 7 & « can be used as a function of type 7’ *> x’, it is 
safe to rely on the disjointness hypothesis p. 


Subsumption on signatures. The subsumption judgment on signatures takes the 
form DF ø <s o’. Signature subsumption is reflexive and transitive. (Reflexivity 
is given by SIGREFL; transitivity is derivable.) According to SIGCONS, unlike 
the standard function type constructor - — -, the signature constructor s:- => - 
is covariant in its domain and contravariant in its codomain. Indeed, when the 
signature s: = Kk appears in the effect of an expression e, this means that e 
has permission to perform an effect named s at type ¿ => «k. In other words, e 
can assume that performing an effect named s is analogous to calling a function 
of type u > «K. This explains the reversed variance. 


Subsumption on rows. The row subsumption judgment is D ty p <r p’. The 
Boolean parameter b will be explained shortly. Row subsumption is reflexive 
and transitive. (Reflexivity is derivable; transitivity is given by ROWTRANS.) 
By combining Empty, EXTEND, ROWCOoNS, SWAP, and ROWTRANS, one finds 
that if two rows, viewed as multisets of effect signatures, are related by multi- 
set inclusion, then they are related by subsumption. Thus, subsumption allows 
permuting row entries in arbitrary ways and extending a row with new entries. 

The last row subsumption rule, ERASE, allows dropping an effect signature 
of the form s : abs. This rule may seem plausible because, both in the presence 
of the effect signature s : abs and its absence, the effect s is forbidden. However, 
an unqualified axiom F (s:abs)- p <r p would be unsound. This is due to our 
interpretation of the row carried by a typing judgment (or by a function type) 
as a disjointness hypothesis. By changing a typing judgment that carries the row 
(s : abs) - p into one that carries the row p, one removes the hypothesis that the 
effect label denoted by s is not a member of the multiset of effect labels denoted 
by p. In order to safely remove a hypothesis, one must prove that it is satisfied. 
This explains why ERASE must carry the premise D IF s # p, whose intuitive 
meaning is that “the hypotheses in D guarantee that the effect label denoted 
by s is not among the effect labels denoted by p”. 
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The parameter b serves to forbid a use of ERASE under ROWCONS. ERASE 
requires this flag to be true, but ROWCONS sets it to false in its premise. Without 
this restriction, one could first combine ERASE and DISJEMPTY to prove F 
(s : abs)-() <p (), then use ROwCons and induction to obtain (s : abs)-p <p p 
without any side condition, thus circumventing the side condition in ERASE. 

The four rules that define the effect /row disjointness judgment D IF s # p 
are straightforward. The first two rules decompose the row p, which is a list of 
effect signatures ø. The last two rules look up the disjointness context D so as 
to find a disjointness hypothesis p that implies the goal. Whether p implies the 
goal is decided based on a simple syntactic criterion: the relation - Cm - denotes 
multiset inclusion; the row on the right-hand side is viewed as a multiset of effect 
signatures.” 

The desire to support ERASE is the reason why the subsumption judgments 
carry a disjointness context. In a hypothetical simplified system where these 
judgments do not carry such a context, the premise of ERASE would have to use 
an empty disjointness context True. This premise would become True lF- s # p, 
which is false, so ERASE would become inapplicable. Yet ERASE is desirable, 
because it is useful in practice. We use it to type-check our encoding of a lexically 
scoped handler: this is illustrated in Section 4.4. 


Why ist (s : abs)-p <p p unsound? In the presence of this axiom, the judgment 
H (s:abs)-(s:abs) <p (s:abs) would be derivable. This judgment can be 
exploited to type-check the following unsafe program: 


1 effect sin 

2 handle 

3 handle (perform s ()) with s : Aw _.not x | A_. true 
4withs:A__.Q|A_.0 


This program is unsafe because the effect s is performed with a payload of 
type unit, namely the unit value () on line 3, and this effect is handled by the 
innermost handler, also on line 3, which expects the payload x to be a Boolean 
value. When this program is executed, it becomes stuck by attempting to execute 
the function application not (). 

Yet, under the assumption F (s : abs) - (s : abs) <p (s : abs), this program 
is well-typed, with an empty row and with the type unit. Beginning at the 
root and working towards the leaves, the type derivation begins with an appli- 
cation of EFFECT, which changes the empty row into the row (s : abs). Then, 
by using SUB and by exploiting the above assumption, the row (s : abs) can be 
changed to (s : abs)- (s : abs). At this point, the harm is done. Indeed, under the 
row (s : abs) - (s : abs), the subprogram at lines 2—4 is well-typed. The fact that 
this row includes two signatures for the effect name s allows us to install two 
handlers for this name. The handler on line 2 allows its handlee—the expression 


5 Our Coq code [36] presently employs a different representation of disjointness con- 
texts and a different definition of the effect /row disjointness judgment. We believe, 
but have not yet checked, that the Coq and paper formulations are equivalent. 
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on line 3—to perform effects according to the signature s: unit = unit. The 
handler on line 3 allows its handlee to perform effects as per s: bool => unit. 
The expression perform s () is type-checked with respect to the composite row 
(s: unit = unit) - (s : bool = unit), which means that this expression must 
respect either of these two signatures. It does indeed respect the first one, so it 
is well-typed. 


4.4 Examples 


Filter Recall the higher-order iteration function iter (Eq. 3), whose type is 


, B no 
iter: Va. VO. a list > (a > unit) > unit. 


Let us use iter in the definition of filter: 
filter zs f = let g = (Av. if f x then perform yield x) in iter xs g 


The expression filter zs f “yields” each element x of the list zs in turn, by 
performing a yield effect if f x returns true. In TES, filter is well-typed, and 
its type is: 


jeld : it)-0 
filter : Va. VO. a list > (a 2; bool) ee int 


Checking that filter is well-typed is not difficult. Under the assumption that 


f has type a $ bool, the subexpression f x has effect 0. Under the assumption 
that xz has type a, the subexpression perform yield x has effect (yield : a > 
unit). Because our subsumption rules allow extending a row with a new entry and 
exchanging row entries, the composite subexpression if f x then perform yield x 
admits the composite effect (yield : a = unit) - 0. 

What does filter’s type mean? Ostensibly, the row (yield: œa = unit) - 6 
tells us that every effect performed by filter zs f must be either a yield effect 
or an effect caused by f. Less obviously, these alternatives must be mutually 
exclusive: indeed, the row (yield : a = unit) - 6 carries the implicit requirement 
that the effect label denoted by yield is not among the effect labels denoted by 
0. In other words, filter’s type forbids f from performing yield effects. 

The reader may wonder what prevents us from instantiating 0 with a row 
that includes the effect name yield, such as (yield : a => unit). The answer is, 
nothing prevents such an instantiation. The result, however, would be a view of 
filter as a function whose effect is (yield : a => unit) - (yield : a = unit). Such 
an effect carries an unsatisfiable disjointness hypothesis, namely yield 4 yield. 
As a result, once the type of filter has been instantiated in this way, filter 
cannot be called anymore.® 


6 Technically, an application of this instantiated filter function can still be well-typed, 
but only if it appears in the body of a function which itself carries an unsatisfiable 
disjointness hypothesis and therefore can never be called. 
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Lexically scoped handlers We now derive a typing rule for lexically scoped 
handlers. Recall the encoding of a lexically scoped handler (Eq. 1):" 


lex-handle, e with h | r = 
effect s in handle e (Ax. perform s x) withs : h|r 
For this construct, TES admits the following derived typing rule: 
LEXHANDLE 
S|A|Pbhe:p:V6.(0%n) 47 eel pt K,T, T" 
S|A|TKh: p:t39(k57)47' E|A|TeKr:p:7r47' 
Z| A|TF lex-handle, ewithh|r:p: T 


This rule is similar to the typing rule for lexically scoped handlers that appears 
in Figure 3 of Biernacki et al.’s paper |6]. What is new and noteworthy is that we 
obtain this rule as a special case of a more permissive type discipline, TES, which 
supports general effect handlers, as opposed to just lexically scoped handlers. 

In LEXHANDLE, whereas the effect on the outside is p, the effect on the inside 
is 0 - p. That is, inside the handlee, one more effect is permitted. The handlee 
(the expression e) must be polymorphic in the row variable 8: that is, it must 
treat this extra effect as an abstract effect. 

The derivation of LEXHANDLE involves an application of EFFECT and an 
application of HANDLE. While proving that the premises of HANDLE hold, a key 
step is to prove that the type of the effect branch h can be weakened as follows, 
where p’ is a shorthand for (s : abs) - p: 

EATE h:p:t> (kbr) Sr 


/ 


p =(s:abs)-p 


B\|A|PRhA: p:t3(eSr) S77 
It is not at all obvious that this is possible! Two occurrences of p must be changed 
into p’. One occurrence is positive and one is negative, and the rows p and p’ 
are not equal. Still, this implication can be established, via rule SUB. One must 
check the following chain of subsumption relations: 


PKS) Sr eas KS ry) Dr Se Sn Ser Se 


The first step requires Fa p <p p’, which, by EXTEND, is true. The second step 
requires p' Firue P <r p, which, by ERASE, is true as well. The disjointness 
hypothesis p’ plays a key role: indeed, True Firye p’ <r p is false. In other 
words, ERASE is applicable because the disjointness hypothesis p’ is available, 
and this hypothesis exists because ARROW causes it to appear as it descends 
into the domains of two function types that are annotated with p’. 


7 This encoding requires choosing an arbitrary name s that does not occur in e, h 
or r. Furthermore, in the derivation of the typing rule LEXHANDLE, s may need 
to be renamed. On paper, we would normally not mention these details. However, 
because our Coq code does not currently allow a-conversion of effect names, we 
make s a parameter of the macro lex-handle and we include a freshness hypothesis 
bearing on s in LEXHANDLE. 
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Counter Using the type rule LEXHANDLE, it is straightforward to check that 
counter (§2, Eq. 2) can be assigned the following type: 


counter: Va by. (V6. (a 2 b) 2 y) > V0. (a LA B) 2 (y x int) 


This means that counter accepts an arbitrary effect-polymorphic second-order 
function ff and produces a function ff’ whose type is similar to ff’s type. The 
only difference between the types of ff and ff’ is in their result types, to wit, 
y versus y * int. 

It is not hard to see that the expression counter (counter (Af. f OD (A_. Q), 
where two instances of counter are nested, is also well-typed, and that its type 
is (unit * int) * int. 


Mix The following second-order function, mix, involves a potentially challenging 
mixture of features: 
mix f = 
handle (performs (); f O) 
withs :A_kkOQ|A_.0 
The effect name s occurs free in this code, so this is not an instance of a lexically 
scoped handler. (We assume that the name s is introduced by the surrounding 
context.) The subexpression perform s (); f () visibly performs the effect s and 
calls the unknown function f, which itself may perform various effects, perhaps 
including the effect s. This subexpression is monitored by a handler for the 
effect s at type unit = unit. 
In TES, mix is well-typed. In fact, it admits several types. We show three: 
the first two are equivalent, and the last one subsumes the first two. 
The first idea that comes to mind may be: “since f has an unknown effect, let’s 
represent this effect with a row variable 6”. Thus, one introduces a row variable 6, 


and one assumes that f has type unit 2, unit. Under this assumption, one finds 
that perform s (); f () has effect (s : unit = unit) - 6. (The subsumption rule 
EXTEND is used, twice, to merge the effect of perform s () and the effect of f ().) 
Finally, using HANDLE, one finds that the body of the function mix has effect 
(s:abs)-@. In summary, mix admits the following type: 


(s:abs)- 


mix : VO. (unit 2, unit) ’, unit (4) 


The effect (s : abs) - 0 carried by the second arrow means that mix never throws 
the effect s and transmits whatever effects f may throw, provided these effects do 
not include s. Indeed, the row (s : abs) -0 is interpreted not only as a description 
of mix’s potential effects, but also as a disjointness constraint. Thus, the row 
(s : abs) - 0 in this type (4) cannot be replaced with just 0. Such a replacement 
would amount to discarding the disjointness constraint, which would be unsound. 

The reader may wonder what happens if 0 is instantiated, in the above type, 
with a row that mentions s, such as s : int > int. Technically, this is permitted, 
but yields a version of mix whose effect is (s : abs) - (s :int = int). Such a 
function can never be called. 
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Thus, this type (4) effectively forbids f from performing effect s. One may 
wonder whether this fact can be made explicitly visible in the type of mix. In fact, 
it can. By the subsumption rules ARROW, EXTEND, and ERASE, the type (4) is 
equivalent to the following type: 


mix : VO. (unit SS unit) (2:29), nit (5) 


Indeed, under the disjointness constraint carried by the outer arrow, the rows 0 
and (s : abs) -@ are equivalent. 

It is worth noting that this type allows the function f to use the effect s 
internally, if desired, and at an arbitrary type, provided this effect is handled 
internally by f and does not escape. 

Finally, one may wonder whether it is necessary to forbid f from visibly 
performing effect s. In fact, it is not: one can allow f to perform this effect and 
let it escape, provided it is performed at type unit = unit, which is the type 
expected by the handler inside mix. It is not difficult to check that mix admits 
the following type: 


(s : unit=>unit)-0 (s : abs)-0 
— a 


mix : VO. (unit unit) ————> unit (6) 
This type (6) is in fact more general than (that is, a subtype of) the previous 
type (5). This follows directly from the fact that s:abs is a short-hand for 
s: L= T and from the subsumption rules SIGCONS, ROWCoNS, and ARROW. 


5 Metatheory 


In this section, we present the general architecture of the proof of our type 
soundness statement (Theorem 3), which states that, if a closed program e is 
well-typed, then e is safe: that is, e may diverge or terminate with a value, but 
cannot perform an unhandled effect. Full details are found in our Coq code [36]. 

Our first step is to interpret our typing judgments as semantic typing judg- 
ments. A semantic typing judgment =| A| TF e: p: 7 is a logical assertion 
stating that substituting certain values for the free variables of e yields a closed 
program that meets a certain specification. To fill in the details, one must define 
precisely which values may be substituted and what specification is met. 

To do so, we introduce TESLOGIC, an extension of Iris [16], an expressive 
Separation Logic. Iris’s base logic has no built-in support for effects and han- 
dlers, but allows constructing a program logic with such support. de Vilhena and 
Pottier define such a logic, Hazel [35]. Because Hazel is tailored for unnamed ef- 
fects and one-shot continuations, we cannot re-use it. Nevertheless, in the design 
of TESLOGIC, we do rely on one of Hazel’s key features, protocols. 

A protocol © describes a service on which the handlee can rely and which 
the handler must implement. Mathematically, it is a binary relation between a 
value v, the payload of the effect, and a predicate ®, the precondition of the con- 
tinuation for this effect. A typical example of a protocol is the pre/post protocol 


A Type System for Effect Handlers and Dynamic Labels 245 


Weakest precondition 
wp e(E){®} & ValidDistinct E.1 — ewp e (E){®} 
Basic weakest precondition 


eup v (E){B} 
ewp (eff Lv K) (E){®} 
eup e (E){#} 


E x (E) v (Aw. > ewp K[w] (EEY) 


lè l> l> 
uw 

a S 
N 
S 


S(o') x ewp e' (E){8} 


Persistent upward closure 


(toh) v® £ 3. Pud x pyw. P (w) — P(w) 


Validity-and-distinctness property 


ValidDistinct L = NoDupLA N 49 () 
LEL 


Fig. 6. Definition of the weakest precondition 


{81}. {82}, defined as Av B. Gi (uv) * OYw. Ba(w) — B(w). We use this protocol 
(in the interpretation of signatures, Figure 7) to attach a precondition ®; and a 
postcondition 2 to an effect: performing an effect with payload v is permitted 
if ı (v) holds, and one can assume that it returns a value w such that &2(w) 
holds. The symbol © is Iris’s persistence modality. Here, it reflects the fact that 
continuations are multi-shot: a single perform expression can “return” several 
times with several different values of w, so we must be prepared to exploit 2 
several times. 


To reason about labeled effects, we introduce the notion of a protocol list E, 
a list of pairs of a label and a protocol. Therefore, whereas Hazel’s weakest 
precondition modality is parameterized with a single protocol, ours is param- 
eterized with a protocol list. In our setting, the assertion wp e (F){®} means 
that (1) it is safe to execute e; (2) if e produces a value v then (v) holds; and 
(3) if e performs an effect labeled £ then it does so according to a protocol ¥ such 
that (¢,W) € E holds. Its definition appears in Figure 6. It is broadly similar 
to Hazel’s wp modality, save for three aspects: the use of a protocol list FÆ; the use 
of a persistent upward closure; and the appearance of a validity-and-distinctness 
property as an assumption of the weakest precondition assertion. The persistent 
upward closure again has to do with the fact that continuations are multi-shot. 
The validity-and-distinctness property expresses two properties of the labels in 
the list Æ; first, these labels are pairwise distinct; second, these labels have been 
allocated. The latter fact is expressed by a persistent points-to assertion [37]. 
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Interpretation of types (selected cases) 


vir 4, k]è (v) 


VIVO. 7]? (wv) 


Vw. Vir]; (w) — wp (v w) (RIIS HYI} 
VE. VIr]? o+n(v) 


l> [I> 


Interpretation of rows and signatures 


(5(s), WAYI 


RI e U sig STe: Sh 
n(0) 


cép sjo? 


4 
4 


Interpretation of typing judgments 


E|A|TRe:p:7 Vn, 6, vs. GIJ} (vs) — wp (e[vs][9]) (RII) {VI} } 
GIL (vs) &V {24 7} CP. Virè (sle) 


Fig. 7. Interpretation of types, rows, signatures, and typing judgments 


This notion of wp enjoys a set of reasoning rules that we omit. The following 
theorem states that it is sound to reason about programs by means of these 
rules: 


Theorem 1 (Soundness of TEsLocic). If wp e ([1){®} holds, then e is safe. 


With TEsLoGIc at hand, let us come back to the definition of the semantic 
judgment Z| A|TFe:p:r. 

As usual, a type T is interpreted as a semantic type, that is, a persistent 
predicate virè on values. More unusually, a row p is interpreted as a protocol 
list Riel}, defined as Usep S [o]?, the list concatenation of the interpretations 
of the elements of p. The environment ô maps effect names to effect labels; 7 
maps type variables to semantic types and row variables to protocol lists. 


This said, our interpretation of types (Figure 7) is mostly standard [19]. The 
5 


n? 


interpretation of a function type, Vir & «Jê, is the set of values v such that 
the application of v to a value w in vir] satisfies a wp assertion with protocol 
list R{[p]> and postcondition V[K]>. What is crucial is that the validity-and- 
distinctness property that we have built into the definition of wp formalizes 
the requirement that effect names be pairwise distinct. The interpretation of an 


effect-polymorphic type involves a quantification VE over protocol lists. 


Theorem 2 (Fundamental Theorem). The syntactic judgment entails the 
semantic judgment: 5|A|Tbhe:p:7 = S|Al|TEFe:p:r. 


We establish this theorem by induction on the syntactic typing judgment. For 
every syntactic typing rule, we prove that the interpretation of the conclusion 
follows from the interpretations of the premises. 

The previous two theorems lead directly to the desired type soundness result: 
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Theorem 3 (Soundness of TEs). If #|@|@t e: () : unit, then e is safe. 


6 Related Work 


Hillerstrém and Lindley [14] study the core calculus of Links [9], a functional 
programming language for web applications, which they extend with support 
for effect handlers. Taking advantage of Links’s row-based approach to type- 
checking records, they annotate function types with rows of effects. Their rows 
use Rémy’s kind discipline [32] to ensure that an effect name can never appear 
twice in a row. 

Leijen [22] formalizes a subset of the Koka language [23]. He presents a cal- 
culus with support for handlers and globally defined effects, a type system with 
value and effect polymorphism, and a compilation strategy for explicitly-typed 
programs. This strategy relies on a selective CPS transformation [26], which 
he extends with support for effect polymorphism. A row in Leijen’s system is 
univariate: it contains at most one row variable. TES, in contrast, allows a row 
to contain several row variables. This ability is exploited, for example, in the 
typing rule LEXHANDLE. Indeed, the premise contains the effect-polymorphic 


type V0. (a = B) se T, where 0 abstracts away the fresh effect label that is 
allocated by lex-handle. 

A notable omission from Leijen’s formalization is Koka’s inject [21], which 
is akin to a lift coercion. Biernacki et al. [4] are the first authors to provide 
a formal treatment of such a construct. They define its operational semantics 
and they propose a type system with effect polymorphism and univariate rows. 
They present the first binary logical relations for effect handlers, and they use 
these relations to prove that their system is sound. In a later paper [5], the same 
authors introduce AĦEL, a calculus that supports both dynamic allocation of 
effect labels and effect coercions. In addition to the lift coercion, they consider 
(1) the swap coercion, which exchanges two effects in a row; (2) the cons coercion, 
which rearranges effects deep in a row; and (3) composition of coercions. These 
new coercions do not add expressiveness: they can be expressed in terms of 
lift. Still, they help programmers control the dynamic search for a handler. 
Biernacki et al. propose a type system with support for universal and existential 
types. Although counter, discussed in Sections 2 and 4, is expressible in AHEL, 
Biernacki et al.’s type system does not accept this program. (This has been 
confirmed by the authors in a personal communication.) The technical reason 
why counter is ill-typed is that the subsumption rules are not sufficiently flexible: 
an abstract row 0 cannot be weakened to a larger row. It is not trivial how to 
overcome this issue, because the interpretation of a signature in Biernacki et al.’s 
system depends on the signature’s position in the row. TES, in contrast, allows 
extension, thanks to the rule EXTEND. 

Zhang and Myers [41] present “a new semantics based on tunneling”, which 
they claim avoids “accidental handling” by construction. As far as we understand, 
however, they do not propose a semantics in the usual sense, that is, a reduction 
semantics. Instead, their “semantics” seems to be a translation of the surface 
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language into a core calculus, Ay. This translation is not formally defined: it is 
sketched by way of examples. Furthermore, as noted by Biernacki et al. [6], there 
is a discrepancy between the paper presentation of Ay and its Coq formalization. 
The paper does not mention dynamic generation of effect labels, but the calculus 
that is formalized in Coq supports this feature via a construct that generates a 
fresh effect label and installs a handler for this label; in other words, a lexically 
scoped handler. 

For this calculus with lexically scoped handlers, Zhang and Myers propose a 
type system with support for effect polymorphism. They prove its soundness us- 
ing binary logical relations. Then, they exploit these logical relations to establish 
interesting typed contextual equivalence laws. One law [41, Example 1] shows 
that an effect-polymorphic function cannot intercept the effects represented by 
an abstract row variable. This law seems to express the intuitive idea of “absence 
of accidental handling”, but we remark that this notion is never formally defined. 

Zhang and Myers [41] and other authors [8] suggest that “absence of acciden- 
tal handling”, sometimes also referred to as “effect safety’, has something to do 
with parametricity. Unfortunately, “parametricity” itself is a somewhat loosely- 
defined concept. As far as we understand, the word “parametricity” refers to the 
fact that a syntactic universal type is interpreted via a meta-level universal quan- 
tification over a certain universe of semantic types. However, the strength of this 
meta-level quantification depends on which universe of semantic types is chosen. 
A smaller universe yields a system with weaker universal types, which may enjoy 
fewer equivalence laws, but may also admit more well-typed programs. 

To illustrate this point, let us ask whether our calculus, TESLANG, can be ex- 
tended with a “dynamic-wind” construct [11]. This construct, dynamic-wind p e q, 
monitors the execution of e and invokes the thunk p whenever control enters e 
(at the beginning of e’s execution and every time e is resumed) and invokes the 
thunk q whenever control leaves e (at the end of e’s execution and every time e 
performs an effect). To type-check this construct, one might extend TES with 
the following typing rule: 


DyNAMICWIND 
S\/A|TRe:p:tf 
El/Al|LE p: p: unit > unit Z| A|IF q: p: unit > unit 
| P:P q: Pp 


Æ| A| T F dynamic-windpeq:p:T 


We have proved that this rule is sound with respect to the interpretation of 
types presented in Section 5. So, our semantic model supports dynamic-wind. 
Furthermore, our semantic model arguably enjoys “parametricity”, since a univer- 
sal type is interpreted via a meta-level universal quantification. Yet, introducing 
dynamic-wind breaks Zhang and Myers’s desired equivalence law [41, Example 1], 
because it allows observing arbitrary effects, without knowledge of their name 
and type. Therefore, “parametricity” does not guarantee “absence of accidental 
handling”. 

The lesson that we draw from this remark is that a programming language 
designer is faced with a tension between making the language more powerful 
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by introducing constructs such as dynamic-wind, allowing new programs to be 
written, and making the language less powerful by forbidding such constructs, 
thereby validating new equivalence laws. Our (unary) semantic model (§5) errs 
on the side of admitting more constructs and fewer equivalence laws. In future 
work, it would be interesting to propose a (binary) semantic model that ad- 
mits fewer constructs and validates more laws, so as to prove that TES without 
dynamic-wind validates Zhang and Myers’s law [41, Example 1]. 

Despite their previous studies of coercions [4,5], Biernacki et al. [6] argue 
against coercions, which they deem impractical for real-world programming, and 
propose a type system for a language that supports lexically scoped handlers 
only. They present two semantics for this language: (1) an open semantics, where 
effect names are not substituted with labels, and where evaluation is defined 
among open terms in a capture-avoiding way; and (2) a generative semantics, 
where effect names are substituted at runtime with effect labels, as in TESLANG. 
By means of binary logical relations, they prove that the type system is sound 
and that the two semantics are equivalent. 

Kammar and Pretnar [18] show that a calculus with effects and handlers but 
without references and without dynamic allocation of effect labels admits a type 
system with unrestricted polymorphism. Thus, generalization applies even to an 
expression that performs and handles effects. Kammar and Pretnar establish the 
soundness of their system via a syntactic approach [40]. The version of TES that 
we have formalized in Coq [36] distinguishes pure and impure expressions and 
allows generalizing the type of a pure expression. The pure expressions include 
expressions that perform or handle effects. Allocating a fresh effect label is still 
considered impure. Although such an allocation seems intuitively harmless, our 
current semantic model interprets allocation as an Iris “update”, and Iris does 
not allow exchanging a universal quantifier with an update modality, so we are 
unable to justify that allocation is pure. We conjecture that this problem would 
perhaps not appear in a syntactic approach. 


7 Conclusion 


In this paper, we have argued in favor of a simple semantics for effect handlers, 
where the dynamic search for a handler is based purely on equality of effect 
labels, and where fresh labels can be generated at runtime. This language can 
express, but is not restricted to, lexically scoped handlers. We have proposed a 
type system equipped with type and effect polymorphism and with a powerful 
subsumption relation. A distinguishing feature is the idea that a row expresses 
a disjointness requirement on effect labels. We have established type soundness 
via a semantic approach. 

In future work, it would be desirable to strengthen our semantic model and 
turn it into a binary model, so as to establish contextual equivalence laws such 
as Zhang and Myers’s [41]. We also wish to investigate support for modules and 
inference of principal types, with the ultimate aim of proposing a strong type 
system for OCaml 5. 
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Abstract Knowledge-based programs specify multi-agent protocols with epi- 
stemic guards that abstract from how agents learn and record facts or information 
about other agents and the environment mutual dependency between the evaluation 
of epistemic guards over the reachable states and the derivation of the reachable 
states depending on the evaluation of epistemic guards synchronous programming 
languages to the interpretation problem of knowledge-based programs and demon- 
strate that the resulting constructive interpretation is monotone and has a least fixed 
point. We relate our approach with existing interpretation schemes for both syn- 
chronous and asynchronous programs interpretation and illustrate the procedure 
by several examples and an application to the Java memory model. 


1 Introduction 


Knowledge-based programs [14] describe multi-agent systems based on explicit know- 
ledge tests on what an agent knows or does not know about itself, other agents, and 
the environment: Extending standard programs, an agent may look beyond what it can 
directly observe by reasoning about the possible states of the other agents and the envir- 
onment in all possible program executions. Such non-local, epistemic conditions abstract 
from how an agent may learn and record particular environmental facts or information 
about other agents. Thus knowledge-based programs rather are specifications of (multi- 
agent) protocols that may be implemented by standard, directly executable programs. For 
being implementable in the first place, however, it has to be ensured that the knowledge 
guards can be resolved consistently given all possible program executions. 

Consider for example a bit transmission [14, Ex. 4.1.1, Ex. 7.1.1], where a sender S 
has to transmit a bit sbit over a lossy channel to a receiver R who has to acknowledge the 
reception, again over a lossy channel. This can be modelled by a knowledge-based program 
over the state variables sbit € {0,1}, rval € {L,0, 1}, and ack € {0,1} as follows: S 
can only directly observe (read) sbit and ack, and R only rval (but both may write all 
variables); (Kp sbit = 0) V (Kr sbit = 1) expresses that R knows sbit’s value and is 
abbreviated by Kp sbit. The behaviour description consists of a looping guarded command 
with two branches that is started with rval = L and ack = 0, but sbit left undetermined: 


do —Kg Kr sbit — (rval + sbit or skip) —S 
l Kr sbit A aKr Kg Kr sbit > (ack + lor skip) od —R 


The guarded branches are separated by a ||, or means a non-deterministic choice, and 
skip doing nothing: S sends the bit as long as it does not know that R received it, and R 
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keeps acknowledging once it has learnt the bit and does not know that S knows this fact. 
The epistemic formulz K, ¢ in the program are to be interpreted as in classical Kripke 
semantics: y holds in all states (or worlds) that agent a currently deems possible. Which 
states these are is regulated on the one hand by what a can observe: any state that is 
indistinguishable from the current one by the available observations is possible for the 
agent. In the example only S can observe sbit, though, due to the protocol, it should be 
possible that eventually R knows its value. On the other hand, the possible states depend 
on which runs of the knowledge-based program may actually happen, i.e., which states 
are reachable taking epistemically guarded transitions: If only the actions of the program 
are taken, it is impossible to reach a state satisfying both rval Æ L and rval Æ sbit, 
which, however, is present in the global state space; but it is decisive that it is not reachable 
in any execution in order to have some execution where Kr sbit can become true. 

The interpretation of knowledge-based programs hinges precisely on this mutual 
dependency between the evaluation of epistemic guards over the reachable states and the 
derivation of the reachable states depending on the evaluation of the epistemic guards. 
This implicit definition of the epistemic state of the agents by the observables and the 
reachable states of the commonly known protocol is in stark contrast to Baltag’s epistemic 
action models [4,31], where the epistemic state is given and manipulated explicitly. In 
many cases, including the bit transmission protocol, the reachable state space may be 
computed using static analysis techniques without taking into account the epistemic 
nature of the guards. However, the interplay between knowledge and reachability may 
sometimes become more intricate: The more states are reachable the less is known 
definitely, and the guards will in turn influence what is reachable positively or negatively. 

Consider, for another example, a variable setting problem [14, Exc. 7.5] involving 
a single agent a and a single state variable x € {0, 1, 2,3}, where a cannot observe x 
directly. The agent executes the following guarded command starting with x = 0: 


ifK,xAl7x<3 
| Kax#3— x<} 1fi 


Being an initial condition, x = 0 is reachable, whereas x = 2 is not reachable as 2 is 
never assigned. However, two different sets of reachable states make for a consistent 
interpretation of the knowledge guards for the remaining values: {x = 0, x = 1}, where 
Kax Æ 1 is false and K, x Æ 3 is true, and {x = 0,x = 3}, with the opposite results. 
The singleton set {x = 0} is ruled out, since both guards would be true such that x = 3 
and x = 1 are reachable; and {x = 0, x = 1, x = 3} is impossible, since both guards are 
false and thus neither x = 1 nor x = 3 are reachable. Breaking this cycle by making one 
of the transitions unconditional on knowledge as, e. g., in 


ifK,x4Al7x<3 
| Kax#3—x}]2 
|| true = x + 1 fi 


yields a knowledge-based program with the unique consistent interpretation {x = 1, x = 
2}. For computing its behaviour, however, several steps are needed, first reasoning that 
x = 1 is reachable, then that x = 3 is not reachable, and, finally, that x = 2 is reachable. 
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Related Work. In their introduction and seminal treatise on knowledge-based pro- 
grams [13,14], Fagin et al. characterise the unique interpretability of such programs by 
their “dependence on the past” w. r. t. some non-empty class of transition systems: The eval- 
uation of knowledge guards in a state coincides for all interpretations in the class that share 
a common past of the state. A sufficient condition for this dependence is that the program 
“provides epistemic witnesses” for all interpretations of the class such that not knowing 
something at some point in time has a counter example in the past. A sufficient condition 
for this provision, in turn, is that the program is “synchronous”, i.e., that all agents can de- 
termine the global time from their local states. For example, the bit transmission protocol 
provides epistemic witnesses and thus is uniquely interpretable; but it is not synchronous. 
The cycle-breaking variable setting program is also uniquely interpretable, but does not 
provide epistemic witnesses. For “asynchronous” knowledge-based programs, De Haan et 
al. [10] suggest to rely on classical iteration of the non-monotone reachability functional 
that interprets the knowledge modalities according to what currently is assumed to be reach- 
able. The computation process is started with all states assumed to be reachable and stops 
when some set of states is repeated. This approach fixes some semantics for all knowledge- 
based programs, also for those which are cyclic and contradictory or only self-fulfilling. 

The problem of mutual dependence of guard evaluation and reachability has also 
occurred in the design of synchronous programming languages [6] for embedded systems, 
like Esterel [7] or Lustre [18], which rely on “perfect synchrony”: a step for reacting to 
some inputs takes zero time and output signals are produced at exactly the same time as the 
input signals. Since thus the status of a signal to be produced can be queried at the same 
time, this requires “logical coherence” saying that a (non-input) signal is present in a step of 
execution if, and only if, a command emitting this signal is executed in this step. Whereas 
Lustre forbids cyclic programs on a syntactic basis, Berry’s approach to the semantics 
of Esterel [8] singles out “reactive” — at least one execution — and “determinate” — 
at most one execution — programs using a static executability analysis: It is computed 
which signals must be present, i.e., have to occur inevitably, and which signals cannot be 
present, i.e., have no emitting execution. This is also referred to as must/cannot analysis 
and has to be performed several times for finding a fixed point of all the signal statuses. 

In logic programming involving “negation as failure” under- and over-approximations 
in terms of three- and four-valued logics lead to the “Kripke-Kleene fixpoint” and “well- 
founded” models; see [11] for an overview. There, however, the temporal dimension of 
reachability or executability is not involved. The “stable model semantics” [16,5] stresses 
the rational inclusion or exclusion of atoms: A set of atoms M is “stable” for a logic 
program JI if it coincides with the minimal set of atoms inferable from the “reduct” IT m 
which is obtained from JI by deleting each clause that has a negative literal =p in its 
body with p € M, and all negative literals in the bodies of the remaining clauses. The 
definition is not algorithmic or constructive; the minimality condition rules out self- 
fulfilling solutions, the reduction process avoids contradictions. Gelfond’s “epistemic 
specifications” [15] extend (disjunctive) logic programs with a modality K for “subjective 
literals” for representing incomplete information in programs with several stable models. 


Contributions. We apply the principles of the must/cannot analysis to the interpretability 
problem of knowledge-based programs. After recalling some basic notions of epistemic 
logic and epistemic transition structures (Sect. 2), we first recapitulate the approaches 
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by Fagin et al. [14] and De Haan et al. [10] in terms of epistemically guarded transition 
systems, a syntax-agnostic format for knowledge-based programs (Sect. 3). For a 
more direct analysis, our account of those designs is state-based rather than run-based. 
We demonstrate the results and the limits of both interpretation schemes by several 
examples that illustrate (a-)synchronicity and non-monotone interpretation for cyclic, 
contradictory, or self-fulfilling programs. The latter behaviour is the main motivation for 
our reformulation of the interpretation problem in terms of epistemic must/can transition 
structures which offer lower and upper bounds on the behaviour of a knowledge-based 
program (Sect. 4). We show that this constructive interpretation is always monotone 
and yields a least fixed point. However, lower and upper bound of the fixed point need 
not always coincide and we relate decided fixed points with the notions of “providing 
epistemic witnesses” and synchronicity. We then derive a representation of the behaviour 
of a knowledge-based program as a general rule system with not only positive but 
also negative premisses (Sect. 5). Such rule systems correspond to logic programs 
involving “negation as failure” and the intended solutions form “stable models”. The 
must/can approximation technique, its monotonicity, and it fixed point properties directly 
transfer to such rule systems. We finally describe an implementation of our constructive 
interpretation approach in the “Temporal Epistemic Model Interpreter and Checker 
(TEmIc, Sect. 6). For model checking interpreted knowledge-based programs, the tool 
supports CTLK, the combination of “Computational Tree Logic” (CTL) with epistemic 
logic. Moreover, this logic can also be used in program guards; the interpretation of 
such temporal-epistemic programs extends the previous approaches. We give some 
applications to the analysis of the Java memory model. 


? 


2 Epistemic Logic and Epistemic Transition Structures 


We briefly summarise the basic notions of epistemic logic for expressing knowledge 
guards [31,30]. We then define epistemic transition structures as the domain of interpret- 
ation of knowledge-based programs. These transition structures combine the temporal 
dimension of executing a program with the epistemic dimension for evaluating what 
agents know. Both the logic and the transition structures are built over an epistemic 
signature X = (P, A) that consists of a set of propositions P and a set of agents A. 


2.1 Epistemic Logic 


An epistemic structure K = (W, R, L) over (P, A) is given by a set of worlds W, an 
A-family of epistemic accessibility relations R = (Ra C W x Waca, and a labelling 
L: W — pP assigning each world a set of propositions. In concrete examples, we will 
require Ra to be an equivalence relation such that if (w1, w2) E€ Ra, then agent a cannot 
distinguish between the two worlds w and w2. The epistemic formule p E€ Pp 4 over 
(P, A) are defined by the following grammar: 


pu=p | false | np | yi Age | Kay 
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where p € Panda € A. The epistemic formula K, ¢ is to be read as “agent a knows p”. 
We use the usual propositional abbreviations true for false and y1 V ye for =(>y1A7¢2). 
Furthermore, we consider the epistemic modality M as the dual of K, such that Ma y 
abbreviates ~K, -y and is to be read as “agent a deems vp possible”. The satisfaction 
relation of an epistemic formula y € Pp, 4 over an epistemic structure K = (W, R, L) 
over (P, A) at a world w € W, written K, w = y, is inductively defined by 


K,w =p <=> pE L(w) 
K,w false 
Kuo Hap = Kuao 


K,w F yi A p2 < K, w E yi and K, w = y2 
K,w | Kay = K,w E ọfa. w EW with (w, w) € Ra 


2.2 Epistemic Transition Structures 


An epistemic transition structure combines a temporal transition relation with an epistemic 
accessibility relation over a common set of states. The transitions describe which states 
can be reached from a set of initial states, the accessibilities specify which states are 
indistinguishable. Knowledge formulæ are evaluated over the associated global epistemic 
structure. This derived structure has the reachable states as its worlds and reuses the 
accessibility relation and the labelling but restricted to the reachable states. 

Formally, an epistemic transition structure M = (S, E, L, So, T) over (P, A) is 
given by an epistemic structure (S, Æ, L), a set of temporally initial states So C S, 
and a temporal transition relation T C S x S. We write S(M) for S, T(M) for T, 
etc. The (temporally) reachable states Su (M) = Upe;, Sk(M) and transition relation 
To(M) = Ups, Tk(M) of M are inductively defined by 


So(M) = So, Sk41(M) = Sk(M)U {5 | ex. s € Sk(M) s.t. (s,s) ET}; 
T)(M) =0, Te (M) =Te(M)U{(s,8') ET | 8 € Se(M)}. 


The associated epistemic structure of M is given by 
K(M) = (So (M), EN Su ( M)’, LtS.(M)) 
where Su (M)? abbreviates Su (M) x Su (M) and L}Su (M) denotes labelling L restric- 


ted to domain Su (M). The satisfaction relation of an epistemic formula y € ®p 4 over 
M atan s € Su (M), written M, s = y, is defined as 


M,s H| 4> K(M),sH g. 


The set of epistemic transition structures over X = (P, A) sharing the same epistemic 
state basis B = (S, E, L, So) is denoted by .@s(B). We say that Mı C Mə for 
Mı, Mə € Ms(B) if T(M1) C T(M2) and similarly extend union and intersection 
from transition relations to epistemic transition structures. 
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3 Knowledge-based Programs 


Knowledge-based programs extend standard programs by explicit knowledge tests. Their 
interpretation involves a cycle: the evaluation of the epistemic guards depends on the 
program’s reachable states, the derivation of the reachable states on the evaluation of the 
program’s epistemic guards. 

We render knowledge-based programs in a syntax-agnostic format as epistemically 
guarded transition systems. Like epistemic transition structures, these guarded systems 
operate on a global set of states with epistemic accessibilities and a propositional labelling. 
All program steps are represented as knowledge-guarded actions of the form y D B with 
y an epistemic formula and B a relation on the semantic states. Knowledge-independent 
decisions are obtained by choosing y = true, and any kind of program control structure 
can be expressed by a judicious choice of guarded actions. 

Breaking up the cyclic step of assigning meaning to a knowledge-based program, 
an epistemically guarded transition system J” is interpreted over an epistemic transition 
structure M yielding another epistemic transition structure T™ . A guarded action y > B 
of I’ contributes those (s,s’) € B for which M,s |} y, where, in particular, s is 
reachable in M. What is sought for is a consistent interpretation with TM = M such 
that reachability and knowledge are mutually justified. Finding such a balanced structure 
is complicated by the fact that the interpretation functional is not monotone in general: 
The more is reachable the less is known and this may make more or less states reachable. 

After introducing and illustrating our format of knowledge-based programs we 
summarise and adapt two existing approaches to their interpretation that have been devised 
for run-based rather than state-based systems: De Haan et al. [10] propose to iterate the 
interpretation functional starting from an epistemic transition structure where all states are 
reachable. Iteration stops when either a fixed point is reached or, due to non-monotonicity, 
a contradiction is found. In this way all knowledge-based programs are assigned some 
semantics and there is no distinction between meaningful and contradictory or just self- 
fulfilling programs. The original approach by Fagin et al. [13,14] characterises knowledge- 
based programs that admit a unique consistent interpretation by the notion of dependence 
on the past. A sufficient condition of providing epistemic witnesses is developed which, 
in particular, applies to the subclass of synchronous knowledge-based programs. 


3.1 Epistemically Guarded Transition Systems 


An epistemically guarded transition system I = (S, E, L, So, T) over (P, A) is given by 
an epistemic state basis (S, Æ, L, So) over (P, A) and a set T of epistemically guarded 
actions p > B consisting of an epistemic formula y € @p A as guard and a transition 
relation BC SxS. 


Example 1. (a) Consider the bit transmission problem of the introduction: 


do —Kg Kr sbit — (rval + sbit or skip) 
| Kr sbit \ aKr Kg Kr sbit —> (ack «+ lor skip) od 


A sender agent S sends a bit sbit € {0, 1} to a receiver agent R over an unreliable channel 
by setting rval € {L, 0, 1}; and R acknowledges the reception over an unreliable channel 
by setting ack € {0,1}. Again, we abbreviate (Kp -sbit) V (Kr sbit) expressing that 
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the receiver knows the bit to be sent by Kr sbit. We concretise the problem into an epi- 
stemically guarded transition system Iy, = (Boz, Toe) with Bet = (Sot, Ebt, Loe, Sbt,o) 
over Xss = (Por, Ane) with Py, = {sbit, rbit, snt, ack} and At = {S, R}. Since we 
use a propositional encoding, we represent rval € {L,0,1} by a proposition rbit for 
the transmitted bit and a proposition snt for the validity of rbit. Further abbreviating 
the knowledge guards Kr sbit by kr, Ks Kr sbit by ksr, and Kr Ks Kr sbit by krsr, the 
transition system I}, is graphically given by 


Zo Ost,s = {sbit, ack} Z4 akon? 
Oot. = {rbit, snt} D kr A “krsr? 


kr An krsr? 


kr Am krsr? 


kr A akrsr? 


aKsr? 
kr A akrsr? 


The states Sz; comprise of {Z0,21,...,27} with Ly:(zo) = Ó, Loe(z1) = {snt},..., 
Ly: (z7) = {sbit, rbit, snt, ack} as outlined in the graph above; the set of initial states is 
Svt,0 = {Z0,Z4}. The epistemic accessibility relations Fst a for a € Ap, are given by 
observability sets Ob a that declare two states s1, 82 E€ Spr to be Op:,q-indistinguishable, 
written as $1 ~O, a $2, if for all p E€ Obt,a it holds that p € Ly:(s1) => p E Lot(s2), 
and consequently Ebt,a = ~O a» Such that Ebt,a forms an equivalence relation. Due to 
sbit € Oot,p, the receiver R cannot “see” sbit and hence cannot distinguish between 
states Zo and z4, but S can. On the other hand, R can distinguish between z; and zs as R 
has access to rbit. Finally, 7,; consists of two epistemically guarded actions 


Kg Kr sbit D {(z:,2:) | 0 < i < 7}U {(Zo, 21), (Z2, 23), (Z4, Z5), (6, Z7)} and 
Kr sbit \ 7KRr Kg Kr sbit D { (Zi, zi) | O<i< 7} U 
{ (Zo, Z2), (Z1, 23), (Za, Z6), (25,27) } > 


which directly reflect the sending and acknowledging actions of the bit transmission 
problem: The system can only advance from Zo to zı (and z4 to z5), where sending has 
been done successfully, if S does not know that R knows the bit; but it need not make 
such progress, i.e., sending can be unsuccessful. Similarly, the system can only advance 
from z; to zg (and zs to z7), where an acknowledgement has been sent successfully, if R 
knows the bit and R does not know that S knows that R knows the bit. 


(b) Consider the variable setting problem of the introduction for a single agent a: 


ifK,xAl7x<3 
JKaxA38—>x¢1 fi 


Encoding the integer x € {0, 1, 2,3} by two bits qı and q2, we model the problem as the 
following epistemically guarded transition system Iys = (Bus, Tvs) with Bus = (Svs, 
Lys, Lys, Soso) over dis = (iss Avs) with Pys = {d1, q2} and Ays T {a}: 
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So 
Ovs,a = 0 Ka a(>q1 A q2)? 
S Ss 


1 2 83 


Ovs,a represents a “blind” agent a that deems all states equally accessible. State s3 is 
definitely not reachable. 7,,, consists of the epistemically guarded actions 


Ka (qi A >q2) D {(so,81)} and Ka-(>q1 A q2) D {(s0;s2)} - 


3.2 Interpreting Epistemically Guarded Transition Systems 


An epistemically guarded transition system T’ = (S, E, L, So, T) over (P, A) is inter- 
preted over an epistemic transition structure M € p, A(S, E, L, So) by interpreting 
each guarded action (p > B) € T w.r.t. M as 


(y> B)¥ = {(5,3') € B | s € S,,(M) and M,s = g}, 


and combining these interpretations into the epistemic transition structure 
M M 
P™ = (S, E, L, So, Uer T”) - 
We call M a solution for I if TY = M. 


Example 2. For the bit transmission problem as described in Ex. 1(a), the epistemic 
transition structure Mp, = (Bat, Tot) with Tes = {4 (zi, zi) | 7 € {0,1,3,4,5,7}} U 
{(z0, z1), (21,23), (Za, Z5), (Z5, z7)} satisfies I," = Mp. This structure just omits 
the states zo and zę with L,,(z2) = {ack} and Ly;(z¢) = {sbit, ack} which are definitely 
not reachable, as Kr sit is false in Zo ~o,, , Z4. Indeed, 


Moi, s = 7Kg Kr shit = s E {Z20, Z1, Z4, Z5} 
Mit, s =| Kr sbit <= s © {21, 23, 25,27} 


Mit, s = 7Kr Ks Kr sbit = > s € {Z0, Z1, Z3, Z4, Z5, Z7} 


However, finding a solution is complicated by the fact that the functional of interpreting 
an epistemically guarded transition system over an epistemic transition structure is not 
monotone, in general, as illustrated by the following examples. 


Example 3. (a) Continuing Ex. 1(b) for the variable setting problem T '»s, consider the 
epistemic transition structure M,, 9 € Ms (Bus) with the empty transition relation 
T(M,s,0) = 0, and hence Sp(Mys.o) = {so}. Setting Mysi41 = Tos t for0 <i < 2 
we obtain successively 


7 Mos,0 7 Mos1 7Mos,2 


Ka -(q1 A -q2) D {(so,81)} {(so,si)} 0 — {(S0,81)} 
Ka -(-q1 A q2) D {(so,82)} {(so, 2) } Ø — {(so,82)} 
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In particular, Mys 2 = Dys = now = Mys,0. However, for Mys.4, Mus,5 € 
Ms (Bus) with T(M,s4) = {(So,s1)} and T(M,..5) = {(so,S2)} we obtain that 
| = Mos,4 and Dy = Mys,5- 

(b) For capturing the cycle-breaking variable setting of the introduction consider the 
following epistemically guarded transition system Iuse = (Bus, Toso) Over &'ys that 
shares the epistemic state basis Bys with Ex. 1(b): 


Ka >(qi A nqz)? 5% aqi A q2)? 


For Mysb,0 = (Bos, 0) with So(Mysv,0) = {so}, and setting Mosb i+1 = Dep et for 
0 < i < 3 we obtain successively 


T pM vst, o Most .1 7 Mosb,2 7 Mosb,3 


Ka 7(qi A7q2) D {(so,8i)} {(so,81)} Ø 0 Ø 
true D {(s0,s2)} {(s0,s2)} {(so,s2)} {(s0,s2)} {(S0,82)} 
Ka =(=q1 A q2) D {(so,83)} {(S0,83)} 0 {(so,s3)} {(so,83)} 


For Mysp,3 with S.,(Mys5,3) = {80, 81,83} it finally holds that D,,)““""? = Mysp,3- 


3.3 Iteration Semantics 


For illustrating the non-monotonicity of the interpretation functional we have started 
the interpretation sequence for J” with the smallest epistemic transition structure which 
suggests to look for a smallest fixed point — which need not exist. De Haan et al. [10] argue 
that a substitute consisting of the greatest fixed point would be more liberal. They construct 
a transfinite approximation sequence starting from an No having all states reachable. For a 
successor ordinal a+ 1, the approximation Na+1 is just the interpretation of I'in Na; fora 
limit ordinal A, the approximation Ny = a< Ua <B<A Ng is “the intersection of unions 
of approximations that are sufficiently close to the limit” [10, p. 269]. The latter is preferred 
over a union of intersections as it includes more states which implies less knowledge, 
such that “agents [know] facts only when there are good reasons for them” (ibid.). Due to 
cardinality reasons, the ordinal nr = inf{a | ex. 8 s.t. œ < 8 and Na = Ng} exists. If 
Na+1 C Na for all a > np, then N,.41 = Npr; otherwise there is some a > nr such 
that Na+ı Z Na. Thus ap = inf{a | nr < aand (Na = Nasi or Nasi Z Na)} 
exists and the iteration semantics of I" is defined as Nq,. This yields the greatest fixed 
point if the interpretation functional is monotone. 


Example 4. (a) For the variable setting problem „s of Ex. 1(b) the interpretation 
sequence (Nus a)o<a Starts with Nys o showing T (Noso) = Svs X Sys. Using the 
epistemic transition structures from Ex. 3(a) it holds that Nys k+1 = Lys Nose — Mya 
for k even and Nys p41 = Mos for k > 1 odd. Thus, Nys = Nus 3 such that 
nr, = 1 = ap,,, since T(Nys.2) = {(80, s0), (So, $1), (s0;,82)} Z 0 = T(Nos,1). 
Hence the iteration semantics of I's is given by Nvs,ı = Mys,2; since its transition 
relation is empty, Lvs has the same iteration semantics as an epistemically guarded 
transition system without any guarded actions. 
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(b) Computing the iteration semantics sequence (Nusb,a)o< k of the cycle-breaking 
variable setting I wsp of Ex. 3(b) proceeds as Nysb,k = Musp,n+1- Since this time the 
functional is monotone from œ = 1 onwards, the iteration semantics is Nysp2. 


(c) Consider the following epistemically guarded transition system Ine = (Bus; Tne) 
over X'ys that shares the epistemic basis Bys with the variable setting problem Ts of (a) 
and only adds the guarded action K, =q2 D {(s9,83)}: 


Sı 
Ka =(qı A =q2)? x 


The interpretation process runs as for Is, and the epistemic transition structure with 
the empty transition relation is also the iteration semantics of Ine. This time, however, 
there is a unique non-empty interpretation, viz. the transition structure consisting only of 
(So, $1). Finding this solution is not constructive and some speculation is necessary: there 
is no solution where sə is reachable; if s2 were reachable, then sı would be reachable 
leading to a contradiction due to the (non-)reachability of s3. Thus only the possibility of 
So and sı being reachable, and sz and s3 unreachable, remains. 


(d) For the epistemically guarded transition system Imay over ({p}, {a}) given by 


uo ui 


Omara = 0 -CPE 


the iteration process when started with Nmay,o having T(Nmay,o) = {u0, U1} x {u0, u1} 
evaluates M, p to true and we obtain Nmay,ı with T (Nmay,1) = { (u0, u1)} which in turn 
is confirmed by the next iteration yielding a fixed point. This iteration semantics, however, 
has a touch of a “vaticinium ex eventu”: p can be reached since p may be reached. 


3.4 Unique Interpretation Solutions 


A knowledge-based program can be executed reliably just step by step if each knowledge 
guard can be stably decided based on what has been computed up to the current point of 
execution. In particular, in order to obtain a solution by execution, knowledge must not 
be invalidated by information only to be gained later on. Conversely, if all knowledge 
guards can be decided by just looking to the past, there is at most a single solution. 
Based on this observation, Fagin et al. [13,14] develop a formal characterisation of 
unique interpretability by capturing the notion that solutions “depend on the past’. They 
then show that “providing epistemic witnesses” is a sufficient criterion for “dependence on 
the past”, which in turn always holds for “synchronous” programs. We briefly summarise 
their main line of argument adapting the demonstration from their run-based account for 
knowledge-based programs to our state-based epistemically guarded transition systems.* 


3 The proofs are available in a long version at https: //arxiv.org/abs/2301.10807. 
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An epistemic formula y € @p 4 is said to depend on the past w.r.t. a class of epistemic 
transition structures M C p,a (B) if for all M1, M2 E€ M and all k € N it holds that 
Tk(Mı) = Tk(M2) implies M1, s = y <— > Mo,s H vforalls € S,(M1)NS, (Mo); 
an epistemically guarded transition system I’ = (B, T) over (P, A) is depending on the 
past w.t.t. M if every y in (p D B) € T depends on the past w.r.t. M. 


Example 5. For Ex. 3(a) neither Ka >(q, A 7q2) nor Ka >(>q; A q2) depends on the past 
w.r.t. {Mus 0, Mvs,1}. In particular, To( Muso) = 0 = To(Mys,1) and So(Mys.o) = 
{so} = So(Mus,1)s but Mys,0; So = Ka =(qı A 7q2) and Moys., So E Ka a(qi A 7q2). 
Similarly for Ex. 3(b), these two formule do not depend on the past w.r.t. {Mys0,0, 
Mosb,1, Mysp,2, Mysp,3}, but they do w.r.t. {Mosb.; Mysb,2, Mysp,3}- 


An epistemically guarded transition system J” has at most one solution if, and only if, it 
depends on the past w. r.t. all its solutions. Due to the dependence on the past the successive 
reachable transition relations T;, (1) of all solutions M = I™ i.e., their pasts, coincide. 


Proposition 1. Let I = (B,7) be an epistemically guarded transition system over 
X. Then T has at most one solution if, and only if, there is an M C Ms(B) with 
{M € Ms(B) | TY = M} C M such that I depends on the past w.r.t. M. 


In order to obtain a solution of I’ by execution, the system is interpreted repeatedly 
to construct the approximations (Mx )o< k With Mk+1 = I™e fork > —1 starting with 
some M_.,. Each approximation Mp with k > 0 contributes a transition relation Ty (Mx) 
which can be combined into a limit Mu. If I depends on the past w.r.t. the class of 
epistemic transition structures from which the approximands are constructed and which 
also contains the limit, then the interpretation of the limit M, yields a fixed point. 


Proposition 2. Let l = (B, T) be an epistemically guarded transition system over X, 
let M C M(B) such that T € M for every M € M and (B, pez Tk(Mn)) € M 
for all (Mg)ock C M with Ty(My) = Tk(Mp) for all k' > k > 0, and let T 
depend on the past w.r.t. M. Let M_, € M, Myx, = T™ for all i > —1, and 


Mo = (B, Uo<p Tk(Mp)). Then PMs = P™”, 


A sufficient criterion for obtaining a comprehensive class of epistemic transition 
structures M such that I’ depends on the past w. r.t. M is provided by epistemic witnesses: 
If some knowledge formula K, y of I’ does not hold at some state of an interpreting 
epistemic transition structure there is evidence in the past of this structure why it does not 
hold. Formally, a structure M € @p,4(B) provides epistemic witnesses for a formula 
Kay € Ppa if for all k > 0, s € Sk(M) it holds that if M, s | Ka y, then there is an 
s’ € Sk(M) with (s, s’) € Ea and M, s K g. 


Lemma 1. Let T = (B, T) be an epistemically guarded transition system over X and 
let M C Ms(B) such that all M € M provide epistemic witnesses for all knowledge 
guards in I’. Then T is depending on the past w. r.t. M. 


A sufficient criterion, in turn, for a structure M € p a(S, E, L, So) to provide 
epistemic witnesses is M being synchronous: if for all a € A and all reachable sı € 
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Sk, (M) and s2 € Sk, (M) with (s1, $2) € Ea it holds that 51, 52 © Smin{kı,ka} (M). In 
a synchronous structure the temporal and the epistemic dimension for each agent are hence 
tightly coupled and agents cannot access the future, but also do not need to know the future. 


Example 6. The interpretation Mə: of the bit transmission problem given in Ex. 2 
provides epistemic witnesses, but is not synchronous: the sender S cannot distinguish zo 
reachable at depth 0 of Mp: from z; that is only reachable at depth 1, and similarly the 
receiver R cannot distinguish zı from zg at the respective depths of 1 and 2. 


An epistemically guarded transition system I: = (B, T) over X provides epistemic 
witnesses if for each M € s (B) the interpretation I provides epistemic witnesses 
for all knowledge formule occurring in some of the action guards of I’; I" is synchronous 
if each T™ is synchronous. Moreover, I’ can syntactically be seen to be synchronous 
(cf. [14, p. 135]) if it is round-based where all agents perform some action in each round 
and record locally which actions they have taken. 


4 (Re-)Interpreting Knowledge-based Programs 


The results by Fagin et al. [13,14] guarantee a unique interpretation for all synchronous 
knowledge-based programs; the approach by De Haan et al. [10] aims at extending the 
interpretation to asynchronous programs, but assigns semantics also to contradictory or 
self-fulfilling programs. 

The necessity of avoiding contradictory or self-fulfilling behaviour already occurs in 
the design of synchronous programming languages [6]: Their underlying principle is 
“perfect synchrony”, that any reaction of a program takes zero time and that thus whatever 
is output in reaction to some input is already present at the same time as the input. Since 
the presence or absence of signals can be tested, this requires “logical coherence” saying 
that a (non-input) signal is present in a reaction if, and only if, this signal is emitted in 
this very reaction. A program needs to be both reactive in the sense of leading to some 
logically coherent signal status, and determinate, i.e., not showing several such statuses. 
For example, in Esterel [7], the program fragment 


present S then nothing else emit S end 


is not reactive, but contradictory: signal S is only emitted if it is not emitted; and 


present S then emit S else nothing end 


is not determinate, but self-fulfilling: S is emitted if it is emitted, and it is not emitted if it 
is not. Such programs can be revealed by using a cycle-detecting static analysis, as is done 
in Lustre [18], or, for including more intricate cases, by Berry’s “constructive semantics’ 
as for Esterel [8]. Building on a “logical semantics” recording what is emitted in each step 
of execution, a must/cannot analysis is performed: what must/cannot be emitted, which 
branch must/cannot be executed. It is then required that for each signal it can be decided 
whether it must be present or it cannot be present. For example, in the parallel execution 


, 


[ present S1 then emit S1 end ] 
|| [ present S1 then present S2 then nothing else emit S2 end end ] 
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both signals can be emitted — if S1 is assumed to be present, and S2 absent —, but 
none must be emitted. Thus the constructive semantics does not reach a decision of what 
must/cannot be present and the program is not constructive. Intriguingly, however, there 
is exactly one coherent signal status that can be reached by execution: S1 and S2 absent. 

We adapt Berry’s constructive semantics approach to knowledge-based programs. 
In fact, the first, non-reactive Esterel program fragment resembles the variable setting 
problem described in Ex. 3(a), the second, non-determinate fragment directly corresponds 
to Ex. 4(d), and the last, combined fragment is essentially the same as Ex. 4(c). We first 
define a must/can version of epistemic transition structures with a lower (must) and an 
upper bound (can). Based on a positive (must) and negative (cannot) satisfaction relation 
of epistemic formule over these structures we show how an epistemically guarded 
transition system can be interpreted yielding another epistemic must/can transition 
structure. For uniformity, we rephrase this interpretation in terms of the negation normal 
form of formule and demonstrate that the constructive interpretation is always monotone 
and leads to a least fixed point. For any knowledge-based program, this fixed point 
soundly shows which executions are necessary and which are possible. However, the 
fixed point need not be decided, and more can be possible than is necessary. We show 
that synchronous programs always lead to decided fixed points. 


4.1 Epistemic Must/Can Transition Structures 


An epistemic must/can transition structure Y = (S, E, L, So, (Tp, T,)) over X = (P, A) 
is given by an epistemic state basis B = (S, Æ, L, So) and two lower and upper transition 
relations T,,,T, C S x S with T, C T,. In particular, Y, = (B, T,,) and Y, = (B, T,) 
are epistemic transition structures over X with Y,, C Y,. 

The positive and negative satisfaction relations of an epistemic formula y € ® p 4 over 
the epistemic must/can transition structure Y at a state s € Su (Y,), written Y, s p y 


and Y, s =n y, are defined as follows: 

Y,s pp => pE L(s) Y,s Hn p => p¢€ L(s) 

Y, s Fp false Y, s Hn false 

Y, s Fp `Y Y, s =n Y Y, s =n 7Y Y, s =p Y 

Y, s Fp p1 A p2 <=> Y, s =n vi Ayo 4> 
Y, s Fp y1 and Y, s Fp Yo Y, s Hn y1 or Y, s Hn Yo 

Y, s p Kay = Ye Hp y Y, s En Kay — Y, n Y 
for all s’ € Su (Yp) for some s’ € S.,(Y;,) 
with (s, s’) € Ea with (s, s’) € Ea 


A formula is positively satisfied over Y if it must be true given the upper bound Y, of 
possible behaviour, it is negatively satisfied if it cannot be true given the lower bound Y, 
of necessary behaviour. In fact, it holds that what must be true can also be true:4 


Lemma 2. Let Y = (S, E, L, So, (Ta, T_)) be an epistemic must/can transition struc- 
ture over (P, A) and p € Bp 4. Then forall s € Su (Yp), Y, s Ep vy implies Y, s Fn Q. 


4 The proofs are available in a long version at https: //arxiv.org/abs/2301. 10807. 
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The set of epistemic must/can transition structures over X and the epistemic state 
basis B is denoted by 2s (B). We say that Yı E Y for Y1, Yo € ZS (B) if Yi, C Yo, 
and Yiv 2 Y2: an extension raises the lower bound and reduces the upper bound. 

As with epistemic transition structures, an epistemically guarded transition system 
I = (S, E, L, So, T) over (P, A) can be interpreted over an epistemic must/can transition 
structure Y € Yp A(S, E, L, So): The interpretation of a guarded action (p > B) € T 
w.r.t. to Y is given by the pair (p D B)” = ((y D B)", (p D B)””) with 


(y > B)** = {(s,s') € B | s € So(Y,) and Y, s p p} , 
(p D B)*” = {(s,8')€ B | s € Su (Yv) and Y, s kn 9} . 


By Lem. 2 it holds that 7°" C rY” for each 7 € T. The constructive interpretation of 
I w.r.t. Y is given by the epistemic must/can transition structure 


TY = (S, E, L, So, (U er T: Urer T”) : 


This is well defined, i.e., (CY), C (TY ),. We call Y a constructive solution for T if 
IY = Y; a constructive solution is decided if Y, = Y7. 

Again as with epistemic transition structures, this interpretation over epistemic 
must/can transition structures can be iterated for finally reaching a stable structure — and 
this time interpretation turns out to be monotone. 


Example 7. (a) Re-consider the cycle-breaking variable setting problem of Ex. 3(b). We 
start the interpretation in Y,5»,0 = (Bus, (0, 52,)) and successively obtain the following 
epistemic must/can transition structures: 


E 7 Yosb,0 Yosb,1 Yosb,2 Yosb,3 
Ka ~la Ana) D {0605} ea 
mendah aa Ty a a 
Kanta Aaa) 2 dosa} tena fleorss)} Heat Hean] 
Not only does it hold that Dysp 2??? = Yysp,3, but the interpretations indeed evolve 


monotonically w.r.t. C. Moreover, the structure Y„s»,3 is decided and everything what 
can happen also must happen, i.e., (Yusb,3)u = (Yoss,3)v- 


(b) For the cyclic variable setting problem, see Ex. 1(b) and Ex. 3(a), the interpretation 
process is monotone, but only yields 


Yos,0 Yos,1 


T T 


Ka =(q1 A =q2) > {(s0,81)} (Ø, {(so,81)}) (0, {(so, 81) }) 
Ka =(=q1 A q2) > {(s0,82)} (0, {(so,82)}) (Ø, {(s0, 82) }) 


The epistemic must/can transition structure Y,, 1 is not decided, and indeed there are 
two solutions of I's in terms of epistemic transition structures. However, the same 
undecidedness holds true for Ine of Ex. 4(c), that is, the unique solution is also missed 
by the constructive interpretation. 


T 


{ 
{ 
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4.2 Constructive Interpretation 


The separated positive (must) and negative (cannot) satisfaction relations over an epistemic 
must/can transition structure Y € Yp_4(S, E, L, So) can be merged into a single, uniform 
satisfaction relation relying on the negation normal form of epistemic formule where 
negation only occurs in front of propositions. For an arbitrary y € ®p, 4 there exists an 
equivalent nnf (p) € Pp, a in negation normal form, such that, in particular 


nnf (sp) = -p nnf (4) = nnf(y) 
nnf(—false) = true nnf(>(y1 A p2)) = nnf (~y1) V nnf (~g2) 
nnf (~Ka y) = Ma nnf (~g) 

The constructive satisfaction relation Y, s |= ¢ for a state s € Su (Y,) and an epistemic 
formula p € ®p 4 in negation normal form is defined just as for arbitrary epistemic 
formulæ, but using the upper bound Y, for the universal quantifier of K, and the lower 
bound Y, for the existential quantifier of M4; in particular, 

Y,s = -=p <= p¢ L(s) 

Y,s = Ka Y Y,s' = y f.a. s' € Su(Y,) with (s, 8’) € Ea 

Y,sE May => ex.s’ E€ S,(Y,)s.t. (s,s) € Ea and Y, s’ Ey 


The constructive satisfaction relation indeed combines |, and =p: 
Lemma 3. Let Y € Yp.4(B), p E€ Ppa, and s € S,,(Y,). Then Y,s -p ¢ iff 
Y,s — unf(y) and Y, s Fn y iff Y, s = nnf (~g). 

It follows that if Y,, = Y,, then Y, s = if, and only if, Y,,, s = ọ or, equivalently, 
Y_, 8 H y. We also obtain that constructive satisfaction is preserved when extending 
epistemic must/can transition structures: 

Lemma 4. Let Y,Y' € Yp4(B) with Y E Y' and let p € Pp 4. Then Y,s - nnf(y) 
implies Y’, s = nnf(y) for all s € S.,(Y/). 


This preservation of satisfaction yields that constructive interpretation is monotone. 


Proposition 3. Let l = (B, T) be an epistemically guarded transition system over X 
and Y,Y' € Ys (B) such that Y LY’. Then TY © rY’. 

Finally, we can observe that %>(B) for B = (S, E, L, So) with the ordering E is 
an inductive partial order: each directed subset A C Xs (B) has a least upper bound 
|_| A w.r.t. E, where directed means that every two Y1, Y> € A have an upper bound 
Y € A such that Y, E Y and Y> C Y; and there is also a bottom or least element 
isp = (S, E, L, So, (0, S x S)) < Ys (B). 


Proposition 4. (25 (B), E, LxB) is an inductive partial order. 


Pataraia’s fixed-point theorem [9, §8.22] now guarantees that the monotone operator 
Y ++ IY for each epistemically guarded transition system I’ = (B, 7) has a least fixed 
point in the inductive partial order. It can be computed by, possibly transfinite, iterated 
application of constructive interpretation to Ls B, that is, Yo = Ly» p, Yori = TY- for 
a successor ordinal œ + 1, and Y) = |], ~) Yo until equality [9, Exc. 8.19]. Compared 
to the iteration semantics of Sect. 3.3, the computation of the constructive semantics thus 
does not have to record all previous approximations in order to find a repetition. 
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4.3 (Un-)Decided Constructive Fixed Points 


If any constructive fixed point Y = IY with Y € %(B) is decided, then there is 
the solution Y,, = I’ Yu = PY = Y, in terms of epistemic transition structures, and 
I is not contradictory. Even if it is not decided, the must/can structures Y,,,, = (B, 
(T(Y,), T(¥,))) € Y(B) and Ypy = (B, (T (Y2), T(Y))) € We (B) satisfy Y E Yup 
and Y C Y,,, such that by Prop. 3 we obtain Y = TY C [%««, Y which yields 
Yn S TY! and TY C Y,, but not equality, in general. For the least constructive fixed 
point pI’, any solution M = I™ thus satisfies (uI), C M C (uI’),, always giving 
sound lower and upper bounds and, if uT is decided, moreover unique solvability: 


Proposition 5. Let I = (B, T) be an epistemically guarded transition system over X 
and assume tI € Ys (B) is decided. Then T has a unique solution in M(B). 


Still, even for epistemically guarded transition systems that provide epistemic 
witnesses it is not guaranteed that the least constructive fixed point is decided: 


Example 8. Consider the following epistemically guarded transition system Tna = (Bra, 
Tha) over Xna = (Pra, Ana) with Pra = {p,q} and Ana = {a, b}: 


uo ul 

Onda = 

ota, es 
Constructive interpretation yields the non-decided fixed point Ypa with T(Yna, ü) = 0 
and T (Yna, ) = { (u0, u1) }, as Yna, Uo K Kp Ma p, but also Yna, uo Æ Mp Ka ~p: the 
states up and u; can be distinguished by agent a, and agent b cannot tell whether a step 
has been taken. In up the formula M,a p holds w.r.t. Yna, but in u; it does not, since 
(u1, uo) Z End,a. On the other hand, Ina provides epistemic witnesses pathologically, 
since Pna,s = Kp Map for any M E€ Ms aı(Bna) and any s € Sola and hence 
has a unique interpretation, which in this case is Dna" = Vndv = ae”. 


K» Ma p? 


For synchronous epistemically guarded transition systems, however, the least fixed 
point is decided, since all knowledge refers to a past that must have happened: 


Lemma 5. Let T = (B, T) be an epistemically guarded transition system over X that is 
synchronous. Let Y € Ys (B) satisfy LTY = Y. Then Y is decided. 


Summing up, the constructive approach to interpreting knowledge-based programs 
subsumes the solutions for synchronous programs and provides a sound procedure for 
obtaining lower and upper bounds for the execution of both synchronous and asynchronous 
programs. The approach, however, is not complete: If the least constructive fixed point pI” 
is undecided, a system I’ may be contradictory without any solution (see Ex. 3(a)), self- 
fulfilling with several solutions (see Ex. 4(d)), or it may have a unique solution in terms of 
epistemic transition structures (see Ex. 4(c)). One strategy that suggests itself for analysing 
T further is to check whether an interpretation using the lower bound (uT), of the 
least fixed point satisfies [7 )# = (ul), = PHE), which means that when executing 
according to what must happen all what can happen is already covered (see Ex. 8). 
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The “executions” of an epistemically guarded transition system J” can be captured as 
derivations of two mutually dependent inductive rule systems, like used for inductive 
definitions [1,19]. One rule system defines the reachability in I’, the other one the 
satisfaction of knowledge formule in negation normal form over J’. When I” provides 
epistemic witnesses, the mutual dependence can be resolved by stratifying the rule system 
for reachability according to the depth of the execution. In the general case, the non- 
monotone dependence of the formula satisfaction system on the reachability system — the 
more states are reachable, the less is known — can be mitigated by extending the notion 
of rule systems to include also negative premisses: The conclusion of a rule is derivable if 
all its (positive) premisses are derivable, but none of its negative premisses. When applied 
to knowledge formule, negative premisses express that no counterexample is reachable. 

The general rule systems can also be read as logic programs with “negation as 
failure” [11]. A direct application of the must/can approximation technique to the general 
rule system or, equivalently, the logic program resulting from a knowledge-based program 
reconstructs the Kripke-Kleene fixed point; the possible solutions correspond to “stable 
models” [16]. 


5.1 Inductive Rule Systems 


An inductive rule system R consists of rules of the form X /y where the premisses X C U 
and the conclusion y € U are drawn from some universe of judgements U. A rule X/y 
is interpreted as “if all X can be inferred, then y can be inferred”. The derivations in R 
together with their sets of premisses and conclusions are inductively defined as follows: 


— ay € U is itself a derivation; its set of premisses is {y}, its conclusion is y; 

— if X/y € Rand (dz)zex a family of derivations with conclusions (x)sex, then 
(dz)zex/y is a derivation; its set of premisses is the union of the premisses of (dy )ze x, 
its conclusion is y. 


A y € U is derivable in R if there is a derivation in R with the empty set of premisses 
and conclusion y. The set of derivable conclusions of R coincides with the least fixed 
point uÊ of Ê: pU > pU defined by R(P) = {y € U | ex. X/y € Rs.t. X C P}. 
In logic programming terms, a rule X/y € R yields a Horn clause y + X [11]. The 
least fixed point uÊ coincides with minimal Herbrand model of the logic program corres- 
ponding to R and thus with the single stable model, as no negation is involved [11,16]. 
For expressing reachability and the satisfaction of knowledge formulæ in an epi- 
stemically guarded transition system T” = (S, E, L, So, T) over (P, A) as inductive rule 
systems, we use two types of judgements, one of the form s E? S, with s € S for “state 
s is reachable in T”, and one of the form s H7 vy with s € S and y € Pp, 4 in negation 
normal form for “state s satisfies formula ọ in [”’. The rules for reachability read: 


; s € Su ifex. (pD B)ET, 
so ET Su s El Sy (s, 8’) € B, and s =T p 
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where s 7 ¢ in the side condition of the second rule requires this judgement to be 
derivable in the rule system for satisfaction. The rules for this system read: 


ifsc? S, if s ET Su, if s ET Su, 
s H7 true sH? p pE L(s) s T vp p ¢ L(s) 
sH” gi sH w sH” yi s H” p2 
s HT p1 Apo sET p1 V po s ET y1 V y2 
SH if(s,s) € Ea (8 Er Y)sersu, (s,8°)€Ea 
s H? Ma se" Su s H? Kay 


Here, the last rule for satisfaction in fact is not monotone w.r.t. reachability: In order 
to infer s H? Ka y it is not necessary to infer s’ H7 ¢ for all s’ with (s, s’) € Ea, but 
only for those for which s’ €? S can be deduced — and also for all of those. 

The notion of providing epistemic witnesses allows to stratify the inductive rule 
systems according to the involved depth k > 0: We specialise the judgement s E7 S, into 
s ET Sy meaning “state s is reachable in T in up to k steps” and, similarly, the judgement 
s =T ginto s Et p meaning “formula ¢ is satisfied in I” at state s considering states 
reachable in up to k steps”. The rules for reachability become for all k > 0: 


; r sel Sk  ifex. (pD B)ET, 
rgo US b zrg AEB ands ET 
So © Sk se Shay (s,s ) € D, and s Hk p 


Analogously the rules for satisfaction become for all k > 0: 


ae if s E} Sy, if s El Sp, 
F ifs € Sk F ame oer 
s H; true sł? p pE L(s) sl -=p p¢ Ls) 
sH yi SEE p2 s Hk Y1 s =} 2 
Lar T OT 
SEE Pi A Y2 SE, 91 V p2 SEE P1 V p2 
s A p if (s, s') E Ea, (s! Ep P)s'ET Sp, (s,8')€Ea 
S HE. May s! ET Sk S HE Kay 


In particular, the rules for s HE Ma yand s HE Ka ọ are sound for epistemically guarded 
transition systems providing epistemic witnesses. The notion of “providing epistemic 
witnesses” requires that, if Ka y does not hold at depth k, there is a counterexample to y 
at depth < k. The general case can be covered by dropping the depths and taking into 
account that K, y does not hold at some state s if, and only if, there is some reachable, 
a-indistinguishable state s’ at which ọ does not hold. Therefore, in order to derive that 
Ka y indeed holds at some reachable state s, it is necessary and sufficient to show that it 
is not possible to derive that ~y holds at some reachable, a-indistinguishable state s’. 


5.2 General Rule Systems with Positive and Negative Premisses 


For expressing negative information in terms of a rule system, we complement the 
positive premisses of the rules by negative ones: We consider general rule systems R over 
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a universe U consisting of rules of the form (X,# Z)/y where X, Z C U are the positive 
and negative premisses, and y € U is the conclusion; it is interpreted as “if all X can be 
inferred but no Z, then y can be inferred”. The derivations in R together with their sets of 
positive and negative premisses and conclusions are again inductively defined as follows: 


— ay € U is itself a derivation; its set of positive premisses is {y}, its set of negative 
premisses is , and its conclusion is y; 

— if (X,4Z)/y € Rand (dz)zex a family of derivations with conclusions (£)sex, 
then ((dz)ezex,4Z)/y is a derivation; its set of positive premisses is the union of 
the positive premisses of (d)+ex, its set of negative premisses is the union of the 
negative premisses of (dv)xex together with Z, and its conclusion is y. 


For a B C U, let R(B) be all those y € U such that there is a derivation of y in R with 
the empty set of positive premisses and no negative premisses in B. The set of derivable 
conclusions of R is given by the least fixed point of R if it exists. 

From the logic programming perspective, a general rule (X,#Z)/y € R can be seen 
as a clause of the form y + X,» Z with n read as “negation as failure” [5,11]. Checking 
that a B C U is a “stable model” of the logic program obtained from R in this way 
corresponds to the following process on general rule systems: first the reduct Rg is formed 
by disregarding all rules (X,#Z)/y € R with BN Z Æ Ú and transforming the remaining 
rules (X,#Z)/y € Rinto X/y € Rg; then Rsg is an inductive rule system and B is stable 
if B = uÊ p. In particular, the stable models correspond to the solutions of R(B) = B. 

With this generalised notion of rule systems we can reformulate and combine the two 
inference systems for reachability and satisfaction in an epistemically guarded transition 
system I’ = (S, E, L, So, T) over (P, A) by using a single judgement s HE y for “state 
s satisfies y in T and state s is reachable in IT”. A negative premiss n(s =; true) thus 
stands for “s €? S, cannot be deduced”. The new rules with also negative premisses read: 


sH ge  ifex.(y> B)ET, 


if s Si ——— 
so KE, true : s' KF true (s,s')E B 
S ET true S EE true 
5p if p € L(s) ae a ¢ L(s) 
S =i p s Eu TP 
sH yi s H5 p2 s H5 pı s HZ 2 
CT _T T 
S =u Yi A p2 s Ho 91 V 92 s Eo 1 V 2 
See ey s EL true a(s! EE nnf(-))(s,0 ee, 
AEF RAN ous if (s, S ) E Ea F 
S Fy Ma 8 Fu Kay 


The rule for s KZ) Ka y checks that s is reachable, but that no counterexample to y can 
be reached at an a-undistinguishable state. 

Using general rule systems, the solvability of an epistemically guarded transition 
system is shifted to computing derivable conclusions. As for knowledge-based programs, 
it is not obvious from just the rules of a system R whether there are solutions of 


R(B) = B at all, and whether there is a least one. 
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Example 9. (a) The general rule system 


Lı HX, HX 
Ro = 4 —, ——— over {21,22} 


Ly T2 


has no set of derivable conclusions, since Ro has no fixed point; in particular, Ro(Ø) = 
{x2} and Ro({z1}) = Ø = Ro({x2}). In terms of stable models, computing Ro (0) 
amounts to removing the negative premisses from the rule (0,#{21, £2})/x2, such that 
the inductive rules {x1 }/xı and 0/x2 remain; and computing Ro({x;}) leads to the 
single inductive rule {21}/x, for i € {1,2}. 

Ro also demonstrates that the set of derivable conclusions of a general rule system R 
need not coincide with the least fixed point of the operator R: pU — pU when transferred 
from inductive rule systems by now setting R(P) = {y € U | ex. (X,»nZ)/y € 
Rst.X C P, PAZ = b}: wRo = {z1}. 

On the other hand, in view of the general rule system for epistemically guarded 
transition systems Ro can also be rephrased as a knowledge-based program with a single 
agent a and a single variable x € {0, 1,2}, which a cannot observe, started with x = 0: 


if Max=1—>x6 1 
| Ka(x#1Ax#2)—=x}2fi 


(b) There may be several solutions of a general rule system, but no least one: 
HX, HX 
Ry = —, — over {21,23} 
T3 Ly 


has the solutions {21} and {23}, but is no solution. It corresponds to the “variable 
setting” knowledge-based program of the introduction, see Ex. 1(b): 


ifK,xAl7x<3 
| KaxA38—>x¢1 fi 


(c) Combining a contradictory rule (0, +{21, x2})/a2 with the non-determined rules of 
Rı we obtain the rule system 


HX, HM HX HX 
R = 
t3 Tı T2 


; | over {x1, 22,23} 


which has the unique solution {21}: if 23 were inferable, i.e., xı not inferable, this would 
trigger the contradictory rule for xz (see Ex. 4(c)). 


5.3 Solving General Rule Systems 


The observations and definitions for epistemic must/can transition structures and con- 
structive interpretation, see Sect. 4.2, can now readily be transferred to a more abstract 
account for general rule systems. In fact, this reconstructs the “Kripke-Kleene fixpoint’ 
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using under- and over-approximations [11], though now using an inductive partial order. 
We also relate the case where the constructive interpretation is not only monotone, but 
continuous to knowledge-based programs. 

Define, for a universe U, the set oTU as {(P,Q) € pU x pU | P C Q} and the 
relation C+ C o*U x otU as (P,Q) C= (P’, Q’) if, and only if, P C P’ and Q 2 Q. 


Lemma 6. (o*U, C+, LẸ) with LG = (0,U) is an inductive partial order. 


For a general rule system R over U with positive and negative premisses define the 
operator R: o*U — orU that describes what must and what can be derived given what 
is assumed to be definitely and potentially derivable: 


R(P,Q) = ({y € U | ex. (X,nZ)/y € Rs.t. X C P, QOZ =H}, 
{y € U | ex. (X,4Z)/yeE Rs.. X CQ, PAZ=0}) 


This is well-defined: if (P, Q) € p*U, then R(P,Q) € p*U, since for P C Q and each 
(X,#Z)/y € R with X C P and Qn Z = Í it holds that X C Q and PN Z = ģ. The 
operator is always monotone: 


Lemma 7. Let R be a rule system over U. If (Pi, Q1) C* (P2, Q2), then R(P,, Q1) C+ 
R(P2, Qo). 


As for constructive interpretation, Pataraia’s fixed-point theorem now guarantees that 
the monotone operator È on the inductive partial order (p*U, C+, L g) has a least fixed 
point. Again, it can be “computed” by possibly transfinite iterated application of R to 
Lz. If, however, R is even continuous, then, by Kleene’s fixed-point theorem, it suffices 
to consider all finite approximations, i.e., wR = Uj cy Š” (LẸ); that È is continuous 
means that if A C ĦU is directed, then JF R(A) = R(U* A). 


Lemma 8. Let R be a rule system over U such that every rule of R has only finitely 
many positive and negative premisses. Then R is continuous. 


The rule system for an epistemically guarded transition system I" = (S, E, L, So, T) 
over (P, A) always has only finitely many positive premisses; if for each s € S and each 
a € A the set {s’ € S | (s,s’) € Ea} is finite, then there are also only finitely many 
negative premisses, such that the corresponding must/can operator is continuous. 


6 Reasoning About Knowledge-based Programs 


We have implemented the constructive interpretation of knowledge-based programs in 
the prototypical “Temporal Epistemic Model Interpreter and Checker” (TEmIc>). The 
tool first computes the least constructive fixed point of a (finite state) epistemically 
guarded transition system. If the least fixed point is decided, the least solution in terms 
of epistemic transition structures has been found; otherwise it is checked whether the re- 
interpretation using the lower bound of the undecided least fixed point yields a solution. 


Shttps://bitbucket.org/knappale/temic 
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If either succeeds, properties of the resulting model can be checked. These properties 
can be expressed in CTLK, the combination of the branching “Computation Tree Logic” 
(CTL) and epistemic logic [21]. What is more, CTLK can also be used in TEmIc for the 
action guards. The constructive interpretation just evaluates each universal quantifier 
of a CTL formula — A for “on all paths” — over the upper bound and each existential 
quantifier — E for “on some path” — over the lower bound. This adds the temporal 
dimension to the domain of application of knowledge-based programs. For the run-based 
interpreted systems of Fagin et al. [13], Van der Hoek and Woolridge [20] and Su [27] 
provide transformations for linear-time model checking based on local propositions, 
though for a fixed set of runs that does not depend on the evaluation of knowledge guards. 
The CTLK-model checker MCMAS [21] similarly operates on a fixed, predetermined 
model. In dynamic epistemic logic and its model checker DEMO [31], the transition 
structure is given by epistemic actions. 

We first recapitulate briefly CTLK and then show its constructive evaluation over 
epistemic must/can transition structures. We next describe TEmIc by means of the bit 
transmission problem and the small paradoxical exercise of the “unexpected examination’; 
the TEmIc distribution also contains specifications for the well-known problems “Muddy 
Children” [31, pp. 93ff.] and “Sum-and-Product” [31, pp. 96f.]. Finally, we proceed to 
an application where CTLK is also used in the action guards: the Java memory model. 


6.1 CTLK 


The CTLK-formule over (P, A) are defined by the following grammar: 


pu=p | false | =y | yi Aye | Kay | EXy | EGy | Efyı U p2] 


where p € P and a € A. The path quantifier E is interpreted as “there is a path”, the 
temporal modality X as “in the next step”, G as “always”, and U as “until”. We also 
consider the path quantifier A for “on all paths” and the modalities F for “eventually” 
and R for “release”, such that ~EG ~g is abbreviated by AF y and 7E[-y, U 79] 
by Alig R p2]. The satisfaction relation M, s = vy of a CTLK-formula ¢ over (P, A) 
at state s € S of an epistemic transition structure M = (S, E, L, So, T) over (P, A) 
conservatively extends the satisfaction relation of epistemic formule by 


M,sEEXp <=> ex. so, 51,...€ P(M, s) s.t. Msi Ee 
M,s | EGy <=> ex. so, 51,...€ P(M, s) s.t. M, si H| ọfa. i ce N 


M, s H Elyi U yo] => ex. 50, 51,...€ A(M,s) and l € Ns.t. 
M, si = yi f.a. 0 < i < land M, sı | ye 


where Z(M, s) denotes all paths of M, i.e., the infinite state sequences so, s1,... E S 
with sọ = s and (s;, $;41) € T for alli € N. A CTLK-formula ¢ is valid in M, written 
M E y, if it is satisfied in all initial states, i.e., M, so = ọ for all so € So(M). 

For a direct definition of the satisfaction of CTLK-formulæ with an A, the existential 
path quantification for E has to be replaced by universal path quantification. As for simple 
epistemic logic, CTLK including AX y, AG ¢ etc. admits a negation normal form (see, 
e.g., [3, pp. 333f.]). The constructive satisfaction relation of a CTLK-formula in negation 
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normal form over an epistemic must/can transition structure Y = (S, E, L, So, T) over 
(P, A) at a state s € S.,(Y_), written Y, s = y, conservatively extends the constructive 
satisfaction relation of epistemic formule and interprets E over the lower bound Y, and 
A over the upper bound Y, such that, in particular, 


Y,s RFEFy <=> ex. s0,51,...€ P(Y , s) and i E€ N s.t. Y, s; = ọ 
Y,s = AF <> f.a. so, s1,... E AY, s) ex. i E€ Ns.t. Y, s; Ey 


6.2 TEMIc 


TEmIc is a symbolic model interpreter and checker for epistemically guarded transition 
systems using CTLK. It is written in Java and uses binary decision diagrams for state 
space representation [28]; it also supports bounded integers and their arithmetic. Given a 
specification, TEMIc first computes the least constructive fixed point by iterated must/can 
interpretation. If this fixed point is not decided it checks whether another interpretation 
using the lower bound of the fixed point yields a solution. If either succeeds, TEMIc 
proceeds with model checking given properties; these statements can be specified as 
CTLK-formule which have to hold in all initial states or as a reachability query. Reachable 
deadlock states without outgoing transitions result in a warning. 

For example, the bit transmission problem of the introduction as formalised in Ex. 1(a) 
can be represented as a TEMIc specification as follows (rules are introduced by keyword 
action followed by a name of the rule and the rule definition): 


var sbit, ack, rbit, snt : boolean initial (ack | rbit | snt) <-> false; 


agent S = { sbit, ack }; agent R = { rbit, snt }; 
let R_knows_bit = exists bit:boolean . K[R] sbit <-> bit; 


action S_sends_bit_ok 

guard not K[S] R_knows_bit do rbit := sbit, snt := true; 
action S_sends_bit_failed 

guard not K[S] R_knows_bit do ; 

action R_sends_ack_ok 

guard R_knows_bit and not K[R] K[S] R_knows_bit do ack := true; 
action R_sends_ack_failed 

guard R_knows_bit and not K[R] K[S] R_knows_bit do ; 


Constructive interpretation yields in a few milliseconds the decided least fixed point 
of Ex. 2, over which some CTLK-properties can be checked: 
check initial EF R_knows_bit; 


check initial EF K[S] R_knows_bit; 
check initial EF K[R] K[S] R_knows_bit; 


The first two are reported to hold, but the last does not since agent R cannot gather 
enough information to be sure that the bit has been received by agent S. 

For another example, consider the “unexpected examination” paradox [10, Sect. 4.7, 
there called “unexpected hanging” (for a detailed account see, e. g., [26, Sects. 5.2f.]): A 
class is told that within the next week there will be an exam, but it will be a surprise. The 
class might reason that the exam cannot happen on Friday, because if there has been no 
exam up to Thursday it will not be a surprise on Friday any more; by backward induction 
it might reason that there cannot be a surprise exam in the next week at all. This problem 
statement can be readily expressed as a TEMIc specification: 
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var day : 0..5 initial day = 0; 
var exam : Q..4; 
var written : boolean initial written <-> false; 


agent P = { day, written }; 


action act1 

guard day < 5 and (day = exam) and (not K[P] day = exam) and not written 
do written := true, day := day+1; 

action act2 

guard day < 5 and (day != exam) do day := day+1; 

action stutter 

do ; 


Again, constructive interpretation yields in a few milliseconds a decided least fixed 
point. Over this epistemic transition structure we can check that on, e.g., Wednesday the 
exam can be written and still is indeed a surprise: 


check reachable exam = 2 & written; 


For such a reachability check TEmIc also provides a witness that tells that act2 is 
executed twice after which act1 follows. The following CTLK-property, however, is not 
satisfied, as it would have to hold in all initial states — and with exam being 4 the class 
cannot be surprised any more: 


check initial EF written; 


6.3 Memory Models 


Memory models regulate the interaction between threads, their caches, and the main 
memory [23]. The original Java memory model — one of the first formal such models — 
has been harshly criticised for making several compiler optimisations impossible and 
has subsequently been superseded by a more liberal model [17, Ch. 17]. Keeping strong 
guarantees for sequentially consistent, well-synchronised programs, reorderings of data- 
independent statements or early, “prescient” reads from other threads are allowed for 
programs with data races. Still, some limits, like consistency with data or control flow 
dependencies or no “out-of-thin-air” values, should be in force [25,2]. 

For example, in the following two-threaded Java-like program to the left it should be 
possible that both thread-local registers r1 and r2 are assigned the value 1 when reading 
the global, shared variables x and y: A compiler could reorder the data-independent 
statements in both threads. This behaviour, however, should be forbidden in the example 
to the right, since there is a symmetric data dependence. 


x=y=0 x=y=0 
ri=r2=0 ri=r2=0 
rl=x; r2 = y; rl=x; r2;="y3 
y=1; xcs if (rl == 1) if (r2 == 1) 
rl=r2=1? ysl r= 
ri=r2= 1? 


We want to capture the behaviour of a multi-threaded (Java) program with a liberal 
memory model without having to check all possible compiler transformations — the 
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correctness of such transformations would actually depend on the program semantics 
including the memory model. In fact, in the current Java memory model out-of-order 
executions have to be justified by other legal executions. We interpret these justifications 
as witnesses in terms of knowledge-based programs; our current exposition, however, 
neglects synchronisation. We first represent the state space of a two-threaded (Java) 
program like the ones above by the following TEmIc declarations: 

var x, y, rl, r2 : 0..2 initial x=0&y=0&rl=0&r2-=9; 

var stepl, step2 : 1..3 initial step1 = 1 & step2 = 1; 


agent t1 = { stepl, r1 }; agent t2 = { step2, r2 }; 


The thread agents t1 and t2 can only observe their local registers and their program 
counters. The program steps for both threads are turned into actions like 


action t1_1 guard step1 = 1 do rl := x, stepl := stepl+1; 
action t1_2 guard stepl = 2 do y := 1, stepl := stepl+1; 


Additionally, we allow for a “prescient reading” of the value v from the main memory 
variable x by thread 0 into the local variable r at step s by the following action: 
action read@_z_v_r_s 


guard step? = s and K[@] (EF (r = 0 & a =v) and EF (r =v & zx = v)) 
do r := v, step := stepd+1; 


The thread 0 can read v from z into r early on if it knows that there is an execution 
where x has value v without dependence on already setting r to v, and, furthermore, that 
there is an execution where the early setting is confirmed. The statement r1 = x; of the 
first thread is expanded into three read actions readi_x_0_r1_1, read1_x_1_r1_1, 
and read1_x_2_r1_1 plus the plain reading action t1_1. With this encoding, TEMIc 
reports that for the first example to the left it is indeed possible to obtain r1 = r2 = 1 in 
the least constructive fixed point, but that this is impossible for the example to the right. 

A more intriguing case is presented by the following two examples: According 
to Manson et al. [23, pp. 35f.] (cf. also [2]), the program to the left can result in 
rl=r2=r3=1: 


x=y=0 x=y=0 

rl S=72 S73 = 0 ri=r2=r3=0 
rl=x; T3vSty; rl=x; Te; r3= y; 
if (r1 == 0) x= 73; if (r1 == 6) y=r2; x= 53 

x1; x=1; 
r2 =X; 

ri=r2 =r3=1? 

y=r2; 


ri =r2 =r3= 1? 


A compiler could see that only 0 and 1 are possible for x and y and “can then replace r2 
= x by r2 = 1, because either 1 was read from x on line 1 and there is no intervening 
write, or 0 was read from x on line 1, 1 was assigned to x on line 3, and there was 
no intervening write”; this definite assignment can be used to transform the last line 
to y = 1; which finally can be made the first action of the first thread, as there are no 
dependencies. But the same transformation is not possible for the program to the right, 
and there the same behaviour should be disallowed. Still, the left program is the result 
of inlining the second thread into the first. Our encoding of the two programs in TEMIc 
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confirms these considerations and the witness for the left program indeed first sets r3 to 
1 and confirms this only in the last step setting y to 1. 


7 Conclusions and Future Work 


We have introduced a must/can analysis for the interpretation of knowledge-based 
programs inspired by the constructive semantics of synchronous programming languages. 
The resulting constructive interpretation provides lower and upper bounds for the possible 
executions. This interpretation has been shown to be monotone and to yield a least fixed 
point. We have also transformed knowledge-based programs to general rule systems with 
positive and negative premisses. Finally, we have described our tool TEmIc for constructive 
interpretation and temporal-epistemic model checking over CTLK and demonstrated 
some applications of interpreting knowledge-based programs including CTLK-guards. 

Our epistemic logic could be complemented by group knowledge [14, Ch. 6], like 
common or distributed knowledge. The temporal dimension could be extended to “Linear- 
Time Logic” (LTL), and, more importantly, to include some notion of fairness. Criteria for 
ensuring decided least fixed points for the must/can interpretation beyond synchronicity 
would be desirable. Also a comparison with non-monotone inductive definitions [12], SOS 
rules with negative premisses [24], and solution strategies for epistemic specifications [5], 
would be of interest. On the other hand, the general constructive approach may be 
useful to complement existing intuitionistic approaches to the semantics of synchronous 
programming languages [22]. Finally, the domain of memory models should be covered 
more comprehensively by interpreting knowledge-based programs. 
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Abstract. Modal types—types that are derived from proof systems of 
modal logic—have been studied as theoretical foundations of metapro- 
gramming, where program code is manipulated as first-class values. In 
modal type systems, modality corresponds to a type constructor for code 
types and controls free variables and their types in code values. Nanevski 
et al. have proposed contextual modal type theory, which has modal types 
with fine-grained information on free variables: modal types are explicitly 
indexed by contexts—the types of all free variables in code values. 

This paper presents Ayj, a novel extension of contextual modal type 
theory with parametric polymorphism over contexts. Such an extension 
has been studied in the literature but, unlike earlier proposals, Ayp is 
more general in that it allows multiple occurrence of context variables in 
a single context. We formalize Ayp with its type system and operational 
semantics given by 6-reduction and prove its basic properties including 
subject reduction, strong normalization, and confluence. Moreover, to 
demonstrate the expressive power of polymorphic contexts, we show a 
type-preserving embedding from a two-level fragment of Davies’ Xo, 
which is based on linear-time temporal logic, to Ayp. 


Keywords: Contextual modal types, Fitch-style modal lambda-calculi, 
Metaprogramming, Polymorphic contexts 


1 Introduction 


It is a common technique in metaprogramming to use code as a first-class value 
to generate, combine, and evaluate code at compile- and run-time. Type sys- 
tems for first-class code are known to correspond to proof systems of modal 
logic under the Curry—Howard isomorphism [5,19,6,30,17]: Modality corresponds 
to a type constructor for code types, controlling free variables and their types 
in code values. Such modal type systems have been proposed for various ar- 
eas of metaprogramming, including multi-stage computation [29,2,13], syntactic 
metaprogramming [7,27], and, more recently, applied to proof assistants [3,21,26]. 

Modal types come in two flavors: implicit and explicit contexts. On the one 
hand, modal types with implicit contexts do not show typing contexts—free 
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variables and their types—of code values. A classical example of a modal type 
system with implicit contexts is Ao [5], in which a code type is expressed by 
OT (‘code of T”), no matter what variables are referenced in the code. It has 
been applied to real programming languages for multi-stage programming, such 
as MetaOCaml [2,13]. Since the type operator © is derived from the modal- 
ity “next” in linear-time temporal logic, we call these code types linear-time 
temporal types. On the other hand, modal types with explicit contexts show 
typing contexts in code types. For example, the type of code x+2 is expressed 
by [a : int]int, which stands for code of an integer expression that includes 
free occurrences of an integer variable x. Such types are often called contex- 
tual modal types [17]. Prior work points out that contextual modal types have 
advantages over linear-time temporal types in dealing with mutable reference 
cells and run-time code evaluation [12,24,14] although it is not actively applied 
to real multi-stage programming languages so far. Contextual modal types is 
rather known for its applications to proof assistants [20,3,21,26], where users 
can operate on code representation of proof terms with explicit contexts. 

Some previous work [12,16,3,21,23] on contextual modal types has suggested 
polymorphic contexts—polymorphism over typing contexts in contextual modal 
types—to abstract part of typing contexts by context variables y: For example, 
a type Vy.[y]Ti — [y]Z2 denotes functions that take code of type Tı under an 
arbitrary typing context y and return code of type Tə under the same typing 
context y. Although we can see that polymorphic contexts will play an impor- 
tant role in metaprogramming with contextual modal types, its type-theoretic 
foundations are not fully investigated yet. 


Our contributions. This paper proposes a novel contextual modal type theory 
Ay that provides a type-theoretic foundation for polymorphic contexts. Our 
technical contributions are summarized below: 


— We develop contextual modal type theory Ayy with polymorphic contexts 
formally: we give its syntax, type system, and operational semantics given 
by 8-reduction. A notable feature of Ayj is that it allows multiple occurrences 
of context variables in a single context, e.g., Vy1-Yy2.[y1; £ : Ta, Y2]T2. 

— We prove basic properties of Ayp: subject reduction, strong normalization, 
and confluence. Our strong normalization proof is based on Girard’s para- 
metric reducibility method, which is adapted to polymorphic contexts. 

— To demonstrate the expressive power of polymorphic contexts, we give trans- 
lation from a two-level fragment of Ao [5] to Ayy and prove that the transla- 
tion preserves typing. To our knowledge, this is the first result that formally 
describes the relation between linear-time temporal types and contextual 
modal types. We will see that Ayyj’s major advantage that allows multiple 
occurrences of context variables in a single context plays a vital role. 


Organization of the paper. Section 2 provides motivating examples from metapro- 
gramming. Our formal development starts with a simple Fitch-style modal type 
theory Ay in Section 3. We extend Aj to Ay; with polymorphic contexts and 
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prove subject reduction in Section 4; we prove strong normalization of Ayj in 
Section 5. Section 6 develops a sound embedding from linear-time temporal types 
to contextual modal types. Finally, we discuss related work in Section 7 and give 
a conclusion in Section 8. 


2 Motivation 


This section provides examples from common metaprogramming use cases. We 
use a hypothetical OCaml-like language with contextual modal types we present 
later. Note that the language is supposed only to illustrate the type theory’s 
informal ideas and is not intended as practical language. 


2.1 Simple Contextual Modal Types: Specializing Power Function 


First, we show a typical example from staged computation, the power function, 
to demonstrate how we can use contextual modal types for staged computation. 


(* val pow : int -> [int |- int] *) 
let rec pow n = match n with 
| 0 -> ‘<x: int> 1 
| n -> let u = pow (n-1) in ‘<x: int>( * ,1(u) [x]) 


(* val power4 : int -> int *) 
let power4 = ,0(‘<>(fun x:int -> ,1(pow 4)[x])) 0 


The function pow generates a piece of code: x * (... * (x * 1)...) that 
multiplies variable x n times; the function power4 puts the code generated by 
pow under function abstraction and evaluates the code at run-time to obtain a 
function value to compute xt without recursion. 

This example uses two constructs for code manipulation: quote of the form 
‘<I>M and unquote of the form ,n(M)[M,,...,M,]. The former, which is 
similar to quasi-quotation in Lisp, generates code of an expression M paired with 
a variable environment I’ under which the code is evaluated. In the example, 
the quote ‘<x: int> 1 is code of constant 1 with the environment with single 
integer variable x. The quote has a contextual modal type [int |- int], where 
the premise (int on the left of |-) corresponds to the environment x:int and 
the succedent (int on the right) to the code body. 

Given a contextual modal type [CF T], we call C a context. A context 
is a sequence of types and does not involve variables. Similarly to de Bruijn 
indices, we identify variables in a context by their position rather than by their 
names. For instance, two quotes, ‘<x:int, y:int>x and ‘<z:int, w:int>z, 
are considered a-equivalent because both use the first variable in the environment 
even though the variable names in the two environments are different. Both terms 
have the same type [int, int |- int]. 

An unquote ,n(M)[M,,..., Mx] is used to expand a code value M. For 
example, ,1(u) [x] expands u of type [int |- int]. In addition to the code 
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to be expanded, an unquote involves two annotations, an explicit substitution 
[M;,..., Mk] and a stage transition n. An explicit substitution provides the 
definitions of the variables in the environment of a quote value. In the exam- 
ple code, ,1(u) [x] supplies an explicit substitution [x] as the definition for a 
single-variable context int. If u is ‘<y:int>y * 1, then the unquote will ex- 
pand to x * 1, replacing y with its definition x. Roughly speaking, a stage tran- 
sition represents the number of nested quotes surrounding M. The expression 
,1(u) [x] applies the explicit substitution to u, and splices the obtained code into 
the surrounding quote. Thus, the code ‘<x: int>(x * ,1(u)[x]) adds “x *” 
to the code denoted by u. On the contrary, the unquote ,0(‘<>(fun x:int -> 
,1Cpow 4) [x])) [] computes ‘<>(fun x:int -> ,1(pow 4) [x]) (to obtain the 
code value fun x:int -> (x * (x * (x * (x * 1)))) with the empty envi- 
ronment) and expands it; since there is no surrounding quote, the expansion 
amounts to running the code. In this sense, the unquote in this language can be 
considered as unquote in Lisp-like languages if the stage transition is 1 and as 
eval function if it is 0. 


2.2 Polymorphic Contexts: Macro repeat 


Secondly, consider a macro called repeat, which repeats a given piece of code 
n times. For example, we expect Lisp code (repeat 2 (print "hello")) to 
show hello two times. We can imitate such a macro as follows: 


(* val repeat : int -> [string -> unit |- unit] 
-> [string -> unit |- unit] *) 
let rec repeat n body = match n with 
| 0 -> ‘<pr: string -> unit>(() 
| n -> let u = repeat (n-1) body in 
‘<pr: string -> unit>(,1(u) [pr]; ,1(body) [pr]) 


This function repeat takes an integer n for the number of repetitions and code 
to be repeated. For example, a macro call in Lisp (repeat 2 (print "hello")) 
can be represented below. 


,1(repeat 2 ‘<pr:string -> unit>(pr "hello")) [print] 


To model macro expansion, we assume the whole code with macro calls is 
surrounded by a quote; hence, we use the stage transition 1, instead of 0, 
to splice the result of the macro call of repeat. Note that the environment 
pr:string -> unit is expected to be the function print. After applying the 
function repeat, we obtain the following code. 


,1(‘<pr:string -> unit>(pr "hello"; pr "hello"; ())) [print] 


Finally, by evaluating unquote, the code is fully expanded (with substituting 
library the function print for pr) to 


print "hello"; print "hello"; (). 
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A problem with the function repeat is that it accepts code values with an 
environment that consists of a single variable of type string -> unit. We rather 
expect the function to accept code values with various patterns of contexts and 
to have multiple types that differ only in contexts: e.g., 


— int -> [string -> unit |- unit] -> [string -> unit |- unit], 
— int -> [string -> unit, int, int |- unit] 

-> [string -> unit, int, int |- unit], and 
— int -> [unit -> unit |- unit] -> [unit -> unit |- unit]. 


We will resolve this issue by abstracting the context part of the function 
with a context variable G. As a result, we obtain the type for generic repeat: 
forall G. int -> [G |- int] -> [G |- int]. We call the type starting with 
forall G. a polymorphic context type, which means that we can instantiate the 
context variable G with any context. We can implement this generic function 
poly_rep by using a context variable as follows. 


(* val poly_rep : forall G. int -> [G |- unit] -> [G |- unit] *) 
let rec poly_rep [G] n body = 
match n with 
| 0 -> ‘<xs: @(()) 
| n -> let u = poly_rep [G] (n-1) body in 
‘<xs: G(,1(u) [xs]; ,1(body) [xs]) 


This function takes an additional context argument G, which is used in quotes. 
xs is a series variable, which is a novel sort of variables in this paper. A series 
variable stands for a sequence of (ordinary) variables—corresponding to the fact 
that a context variable stands for a sequence of types—and forms an environment 
by pairing with a context variable. For example, xs:G will represent environment 
x:int, y:string if we substitute x, y for xs, and int, string for G. We can 
also use series variables for explicit substitution. If we use a series variable in an 
explicit substitution, asin ,1(u) [xs], xs stands for an explicit substitution con- 
sists of a series of variables. For instance, if xs:G expands to x: int, y:string, 
then ,1(u) [xs] also expands to ,1(u) [x,y]. In this case, series variables work 
like identity substitutions in prior work [26,3,21,23], which pass variables from 
an environment to explicit substitutions as-is. 

Using poly_rep, we can repeat code with two variables as follows: 


poly_rep [unit->int, int->unit] 3 
(‘<rand:unit->int, printInt:int->unit>(printInt (rand()))) 


We apply to the context unit->int, int->unit in order to instantiate the 
context variable G. It is worth noting that the series variables accompanied by G 
will also be replaced automatically with fresh variables. In this case, the quote 
‘<xs: G>(,ulxs]; ,body[xs]) will turn into 


‘<x: unit->int, y:int->unit>(,ulx,y]; ,body[x,y]) 


where the series variable xs is replaced with fresh variables x,y. This way, a 
mapping between variables and types is well maintained. 
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2.3 More Polymorphic Contexts: Combining Different Environments 


Sometimes, we might want to use pieces of code with different environments. 
Consider a function generic_plus, which takes two pieces of code as arguments 
and returns a piece of code that sums the values of the two arguments. We can 
implement such a function with ease. 


(* val generic_plus: 
forall G H. [G |- int] -> [H |- int] -> [G, H |- int] *) 
let generic_plus [G H] x y = ‘<xs:G, ys:H>(,1(x)[xs] + ,1(y) [ys]) 


It takes two context variables G and H and puts them together in the same con- 
text. As a result, we can use variables from both contexts. Although this example 
is very simple, it demonstrates the novel feature of our contextual modal type 
theory: it permits multiple occurrences of context variables in the same context, 
as in [G, H |- int]. As far as we understand, previous work that supports 
context polymorphism only allows a single occurrence of context variables. We 
discuss the detail in Section 7. 

One may wonder whether multiple occurrences of context variables are useful. 
As we answer in Section 6, this novel feature is crucial to achieve the express- 
ibility of the multi-stage programming languages in the literature. 


3 Simple Fitch-Style Contextual Modal Type Theory 


As an introduction to contextual modal types, this section formulates simple con- 
textual type theory Ag without polymorphic contexts. Nanevski et al. [17] formu- 
lated their original contextual modal type theory in dual-context style [19,6,11], 
which has judgments with two-level contexts. In contrast, we formulate Aq in 
so-called Fitch- or Kripke-style [4,1,15,6,31]. We choose this design because the 
Fitch-style formulation provides Lisp-like quote/unquote syntax, which is akin 
to that in linear-temporal type theories [5,30], and hence it is easier to compare 
these two type theories. We demonstrate a formal comparison in Section 6. 

We obtain Ag by extending S4 Fitch-style modal calculus with contextual 
modal type theory. One can consider it a combination of the Fitch-style modal 
calculi by Valliappan et al. [31], and the contextual extension by Nanevski et 
al. [17]. At the same time, we tweak definitions for an extension to polymorphic 
contexts in Section 4. 


3.1 Syntax and Type System 


Types and terms in Aj are shown in Fig. 1. Types consist of base types ranged 
over by 2, function types S$ — T, and contextual modal types [C F T]. A 
contextual modal type [C H T] generalizes an S4 modal type OT by adding a 
context C, which is a finite sequence of types. It describes code of type T with 
free variables whose types are C. Note that a contextual modal type with a 
empty context [e+ T] has the same meaning as OT, which denotes closed code 
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Types S,T =| S> T|[C F- T] 

Contexts C,D :=6| C,T 

Stage transitions k €N 

Terms M,N = z | AzT.M | MN | quo(Î) M | unq, M [6] 
Explicit Subst. 0 :=6e|0,M 


Named Contexts T, A :=e|I,2: T|I,f@ 
(Ê and A denote named contexts with no @.) 


Fig. 1. Syntax of Aj 


of type T. In addition to standard terms of simply typed lambda calculus, Ag has 


two forms, quote quo(I’)M and unquote unq M[6]. We define stage transitions 
as natural numbers, and explicit substitutions as sequences of terms. 

We often use the word named contexts for typing contexts with variables and 
use “contexts” for type-only ones. Similarly to other Fitch-style formulations, Aj 
extends named contexts with a special symbol é@ (called lock) that delimits levels 
of variables. For example, in a named context z: T;,@@, y: T2,z: T, the variable 
x has one higher level than y and z (we will revisit the notion of levels in the 
definition of free variables). A named context is well formed iff the variables in 
it do not have duplication; we assume that all named contexts are well formed. 
We also require a named context in a quote to be single-level, i.e., not to contain 
@. We write Ô for such @free named contexts. rg(I’) denotes the range of I’, a 
context obtained by forgetting variables in Ê, and dom (T`) denotes the domain 
of I’, the set of variables in I’ (locks can appear in I, unlike rg). We also define 
the weakening relation I, < I> as follows. 


Ii < Íb I, < I> Ii <1) 
e<e i,t: T< Izpi T Ii <Io,2: T T, Q< I>, 


As is common in other Fitch-style formulations, Ayp has a somewhat com- 
plex binding structure. We show the definition of free variables in Fig. 2. For 
a term M and integer k, FV;(M) is a set of free variables in M at level k, 
which roughly stands for the number of quotes surrounding M. Since an un- 
quote unqą, M [0] cancels kı surrounding quotes, the level is lowered by kı. Aj 
has two binding forms: A lambda abstraction AxzT.M binds all level-0 free oc- 
currences of z in M and a quote quo(I’) M binds all level-0 free variables from 
Î in M. According to these binding forms, we define a-equivalence (but omit 
its definition). For example, \x!T" T2] quo(«: T,)(unq,(«)[z]) is a-equivalent to 
Ay!™ T2] quo(z: T3)(unq,(z)[y]). As we shall see later, the typing rules of Aq 
enforce well-typed terms to be closed with regard to negative-level free variables. 
Thus, we only care about positive-level free variables in this paper and assume 
that the meta variable k ranges over natural numbers. 

Typing rules are given in Fig. 3. The judgment k: I’ < A states that there 
is a stage transition k between two named contexts I’ and A. The rules mean 
that k is the number of locks between I’ and A, e.g, 0: 2: Td a: T and 
2:y: Ti dy: T,,f@,@,z: To. The judgments [+ M: T and F F 0: C state 
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FVx(M) || FV: (8) 


Vale oy if k =0 
otherwise 
FV, (At T FV(M)— {z} ifk=0 
FV(M otherwise 
FV (M = FV sea 
FVo( —d f) ifk=-— 
FV; (quo(I” = of emir) 2 
Se otherwise 
FV in (unq,, M [9]) = FV ig iy n U FV, (0) 
FV; (e) = 0 FV (0, M) = FV; (0) U FV (M) 


Fig. 2. Free variables. This definition assumes k is an integer, but typing rules enforces 
that FV(M)=ģ if k <0. 


that term M has type T and explicit substitution 0 has context C under named 
context I, respectively. The rules for variable z, lambda abstraction AzT.M, 
and application Mı Mə are almost the same as those in simply typed lambda 
calculus, except that we only care about variables from tail (T`), the level-0 part 
of I’. The type of a quote quo(I” )M is derived by popping all level-0 variables in 
the named context (Recall lock does not appear in I). Thus, Ê binds all level- 
0 free variables in M. An unquote unq M[|0] uses 0 as a substitution for the 
context C, and k as the stage transitions between M and 6. We call a judgment 
derivable when it is derived from these typing rules. We assume that judgments 
in this paper are derivable if not stated explicitly. 


3.2 Substitution 


We define substitution on terms and explicit substitutions. We follow the style 
of Valliappan et al. [31], which proposes simultaneous substitution on all free 
variables with any level. We provide definitions related to substitutions in Fig. 4. 


A substitution typing judgment F o: A => T denotes that we can replace 
a named context A with another I by applying a substitution ø, e.g., F (z = 
ry): (z: To) > (a: Ti > To,y: Tı). A lock substitution @, has two roles. 
First, they provide information on the level of free variables to be substituted. 
For example, if o = o1,f@,,02 where o2 does not have lock substitutions, o2 
substitutes level-0 free variables, and gı substitutes higher-level free variables. 
Second, they replace the lock themselves. If ø has a lock substitution #@,, it 
means that it replaces a lock in A with k locks in I’. 
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ki: Daa 


k:Taa kK: TAA 
Orar k:r<4,<z: T k+1: r< 4, ù 


TEM: TWT HOC 


x: T € tail(I) Iz: Ty} M: T TEM: T — To T+ M: Ti 
Pha: T PE Ag™.M: T> To TE M Mo: Te 
T,f,At M: T rM:[CHT] AF8: C k: Daa 
T H quol Â) M: [rg(A) + T] AF unq, M[0]: T 
PrOAC TEM: T 
Tre:e TFOM:C,T 


Auxiliary function 


tail(e) =e tail (T, @) =e tail (T, x: T) = tail (T),x: T 


Fig. 3. Typing rules of Ap 


Substitution application on terms M[o] and explicit substitutions 6[o] per- 
forms actual substitution operations. They are defined to satisfy the following 
lemma, which is expected by the intuition of substitution typing. 


Lemma 1 (Substitution Lemma). Jf [+ M: T andt o: I => A, then 
At Mo]: T. 


For example, let us consider I H (unq,(x)[y]) y: T, where I = z: [St S > 
T],@, y: S. We can construct the following substitution that provides a term for 
each variable in T. 


H (x := z'o, y := zw): IT > (x: [SF S —> T],z: S> S,w: S) 


This substitution replaces level-0 occurrences of y to z w and level-1 occurrences 
of z to x’. fọ in the substitution denotes that level-1 free variables of target 
terms are mapped to level-0 terms; that is why the level-0 term 2’ is supplied for 
the level-1 variable x. We can observe that the substitution is applied as follows. 


(Cung: (2) (yl) ye = 2", fo, y = zu (1) 
= (ung, (2)[y])[e = 1’, o, y = z w) (yle = 2’, Mo, y = zw) (2) 
= (ungg(2[z = 2'))[yle = s'o, y = zw) (yle = 2',@o, y = zw]) (3) 
= (ungg(2’)[2 w]) (zw) (4) 


The most interesting equation is the one from (2) to (3). The substitution for x 
is shifted by 1 level, and the stage transition of the unquote changes from 1 to 0 
to align staging levels. The resulting term is given type T under the new named 
context, as the substitution lemma states. 
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Substitution o ::= è | o, x := M | o, @; 


Fo: AST 
Fa: As>It k: Ii aI» ko: AST CTEM:T 
Fe:e>TI H (o,f): (4,8) > I F (o, x := M): (A,r: T) >T 
M {o] || êlo] 
[M ifs=M € tail (o) 
ARA x otherwise 
(AxT.M)[o] = \x".(M[o]) where x ¢ dom (tail (o)) and z ¢ FVo(c) 
(M N)[o] = (M[o]) (NIo]) 
(quo(I’) M) [a] = quo(I’)(M [o, @, id) 
(ung, M[0]) [o] = und count(t,0)) (Mlo T kJ) [Ale] 
elo] =e (0, M)[o] = O[o], M [o] 


Auxiliary functions 


BV if ko > k 
FV, (0, £ := M) = FV; (oc) U FV; (M) Wala) = { ko (0) if ka > hy 


) otherwise 
tail (o, x := M) = tail (o), £ := M ide =e 
tail (o, @,) =e idr: r = idr,® := T 
count(0, o) = 0 idre = idr, @ 
count((ki + 1),e) = kı +1 at0=oa 
count((k + 1), (o, x = M)) = count(k + 1,0) et(k+1)=e 
) 


) 
count((ki + 1), (o, @@.)) = count(ki,g) + k2 (0,2 = M)t(k+1)=oT (k4+1) 


(o,r) T (k2 +1) =ot k2 


Fig. 4. Substitution 


In Fig. 4, we also define identity substitutions that satisfies F idp: [ > I 
for any I’. We can confirm that idr does not affect the result of substitution, as 
stated in the following lemma. We use this property to define reduction later. 


Lemma 2. M[o] = M|idr,o] for any T. 


3.3 Local Soundness/Completeness and Reduction 


According to Pfenning and Davies [19], the introduction and elimination rules 
for a type constructor should satisfy local soundness and local completeness, 
which correspond to $-reduction and 7-expansion, respectively. We confirm that 
contextual modal types meet those conditions and then define reduction rules. 

Local soundness states that the elimination rule is not too strong. For the 
case of contextual modal types, we can witness it by the following local re- 
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duction where we obtain the derivation D' by application of the substitution 
lidr,@,, A := 0], which we obtain from E£ and k: I < J”. Here, A = @ denotes 
a substitution that maps each variable in A to each term in 6. 


D 
T,a@,AtM:T € 
TH quo Â)M:[CH T] H0: C k: Par’ D' 
I’ + unq, (quo(A) M)[6]: T => I+ Miidp,, Â := 6]: T 


Local completeness states that the elimination rule is sufficiently strong. We 
can confirm this condition by the following local expansion (we assume that 


rg(A) = C). 


D è 
TrHM:[CHT] DEA domh C 1:rara,A 
D T, @, Â- ung, M[dom(Â)]: T 
PEM:(Cr T] > T + quo(A)ung, M[dom(Â)]: [CF T] 


These patterns provide base cases for -reduction and 7-expansion. This 
paper focuses on (-reduction, which we define as follows. 


Definition 1 (G-reduction). We inductively define full reduction relations on 
terms and explicit substitutions, +g. We show main rules other than congruence 
below. We also define +% as the reflexive transitive closure of +g. 


(AzS.M) N >, M[x := N] unq; (quo(? : C)M)[6] +, M[M@,, X := 6] 


We safely omit identity substitutions found in these rules, thanks to Lemma 2. 
We do not dive into the basic properties of Aq for now because we discuss those 
of its extension Ayj in Sections 4 and 5. 


4 Polymorphic Contexts 


This section proposes a novel type theory Ayj that extends Ag with polymorphic 
contexts. We quickly go through an overview of its syntax and semantics, focus- 
ing on the differences from Aj. As examples in Section 2, the critical idea of Ayp 
is the notion of series variables, which can be considered the term representation 
for context variables. 


4.1 Syntax, Type System, and Substitution 


We provide the syntax of Ayy in Fig. 5. First, Ayp has two additional sorts of 
variables: context variables y,6, standing for contexts, and series variables x, y, 
representing sequences of variables. Ayp adds polymorphic context types of the 
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Types ST g= | Vy.T 
Contexts C,D s=...| Cyy 

Terms M,N :=--- | Ay.M | MQCC 
Explicit Subst. = | 0, x 

Named Contexts T, A := |T, x: y 


Fig. 5. Syntax of Ayp 


form Yy.T, which binds y in T. It represents the set of types obtained by sub- 
stituting any context C for the context variable y. Two kinds of terms Ay.M 
and M@C are added as introduction and elimination for polymorphic context 
types. We allow C to include polymorphic context types; thus, polymorphism 
in Ayp is impredicative. The definition of contexts means that we can abstract 
any part of a context with context variables, e.g., V¥1.Vy2-[71, 4, Y2 F 4]. Accord- 
ingly, series variables can appear in explicit substitutions, and a pair of a series 
variable and a context variable can appear in a named context. FV is updated 
to accommodate series variables but we omit the definition here. 

It is worth noting that context variables are not subject to staging. This 
allows us to use the same context variable across levels—for example, the type 
Vy.[y F [yF T]] binds both occurrences of y although they are in different levels. 
The definition of free context variables, denoted by FCV(—), is straightforward 
and we omit it in this paper. 

We give additional typing rules and defining clauses of substitutions in Fig. 6. 
We also extend the auxiliary functions such as tail to accommodate the new 
syntax but we omit their definitions. The introduction and elimination rules 
for polymorphic context types are similar to those for the polymorphic types 
in System F [8]. The definition of context substitution Ty := C] for types is 
straightforward and omitted. The other rule for explicit substitutions states that 
we can add x: 7 to an explicit substitution if it appears in the level-0 part of T. 
The point of the extension of substitution is that a series variable can only be 
replaced with another series variable, not an explicit substitution. With these 
extensions, we can confirm that the substitution lemma holds as expected. 


4.2 Context Substitution 


We also define substitution for context variables, which is the most non-trivial 
part of Ayj. To describe the core idea of context substitution, let us consider a 
term quo(x: y)(unq, M[x]). If we naively substitute a context T, for the con- 
text variable y in this term, we would obtain quo(x: (7,5))(unq,M[x]), where 
x: (T, 6) is simply ill formed as a named context. Instead, we will take the fol- 
lowing steps. 


1. We check the occurrences of y in the named context of the quote 
quo(x: y)(unq, M@[x]), and collect series variables that are associated to y. 
In this case, we have only x. 

2. We generate a series of fresh variables to be substituted for x. Each variable 
corresponds to each element of the new context T,6. Suppose we generate 
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FEM: T|i©PF@:C 


PEM:T 47 FCV(P) PEM: V4.7 
DE Ay.M: Vy.T r- MaC: Tly:= C] 


TOE C x: y € tail (T) 
TE Ox: Cry 


Substitution o :=...| 0,2 := y 


M{o] || Ole] 


(Ay.M)[o] = Ay.(M[o]) if y ¢ FCV (0)  (M@C)[o] = (M[o])@C 


aef at 


Fo: ASTI 


Fo: ASTI y: y € tail (T) x ¢ dom (A) 
Fo, x:=y: A4 xz: y> rr 


Fig. 6. Additional typing rules and definitions of substitutions in Ayj 


new variables x,y for T,ô. As a result, we get a variable series substitution 
Ki= Ty. 

3. We apply context substitution y := T,ô to the named context x: y along 
with x := x,y. As a result, we get a new named context x: T,y: ô. 

4. We also apply the variable series substitution to unq; Mfz] and obtain 
ung, M[z, y]. 

5. As a result, we obtain a substituted term quo(z: T,y: 6)(unq, M[z, y]). 


In this way, substitution for context variables essentially requires three op- 
erations (1) to replace context variables with contexts, (2) to generate fresh 
variables to be substituted for series variables, and (3) to replace series variables 
with sequences of variables. We start its formal definition with the following 
new objects. We write G, and G, for infinite sequences of ordinary variables 
and series variables without duplication, respectively. 


Context substitution X use| Li y= C 
Variable series T, Y =| T, y| ?,y 
Variable series substitution G@ s=e|G,x:= pi |5, @ 
Variable generator G == (Gu, Gs) 


A context substitution X maps context variables to contexts, and a variable 
series substitution @ maps series variables to variable series, that is, sequences 
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of ordinary/series variables. Note that series substitution does not affect stage 
levels; hence, locks in series substitution are not annotated with stage transitions. 
A variable generator consists of streams of non-duplicating variables and series 
variables. We use it to generate fresh variables. rg (g) denotes the variable series 
obtained from the range of c. 

We define application of context substitution in Fig. 7. Application of a con- 
text substitution to types T|X] and contexts C[X] is straightforward; we simply 
replace context variables in a capture-avoiding manner. We omit their definitions 
from the figure. On the contrary, context substitution on terms M[2’;@]q and 
explicit substitutions 6[’; a]g comes with not only X but also a variable series 
substitution o and a variable generator G. X is used to replace context variables 
in types in A-abstractions and I’ in a quote; g is used to substitute series vari- 
ables in explicit substitutions and I in a quote. The most interesting is the case 
for a quote quo(I’) M: first, a variable series substitution a’ is generated by the 
auxiliary function destruct (Step 2 above); second, X and the generated g’ are 
applied to I" to yield the new named context (Step 3); finally, we apply X and 
a, @, 5’ to the body of the quote (Step 4), after removing variables in dom (Î) 
and generated ones from the generator; here, (Gu, Gs)— S means (Ge \ S, Gs \ S). 
The auxiliary function destructa (T, X) scans I to find context variables in the 
domain of X, generates fresh (ordinary/series) variables by using gensyms, and 
returns a variable series substitution. gensymsg(C, V) produces a sequence of 
ordinary /series variables of the same length as C; fresh variables are chosen 
from earlier ones in G but not in V. 

For example, consider applying X = y := Tı, y and the empty variable series 
substitution to M = quo(x: y,£: t,y: yY} Mo. destructa ((x: 7,0: t, y: Y), (Y = 
T,,7’)) returns x := (2’,x’), y = (y’,y’) for some fresh z’, x’, y’, and y’ (with re- 
spect to G) and, thus, M[X; e]g is quo(z’: Ti, x: 7, £: 1, y': Ti, y’: Y) Mi where 
Mi = M| X; (0, @, x = (2’,x’),y = (y', y’))|q and G’ = G- {x, x,y, 2’, x’, y',y’}. 

We can confirm that context substitution preserves derivable judgments. 
Lemma 3 (Context Substitution Lemma). 

1. fr} M: T then T|X;0]  M[X;o]e: T|X] where & = destructg(L, X) 

and G! = G — (dom (T`) U rg(a@)) for any X and G. 

2. If PF 0: C then [23a] F 0X; ola: C[L] where = destructg(I, X) and 

G' = G — (dom (I) U rg(G)) for any X and G. 

Although we use variable generators to get fresh variables, the result of con- 
text substitution should be equivalent under renaming. We can confirm this 
intuition by the following lemma. 


Lemma 4. If [| M: T, o = destructa, (I, X) and G2 = destructg, (I, X), 
then there is a renaming substitution o such that T|X; 01] / M[X; o2Jq [o]: T[2] 
with some Gi. 
Corollary 1. If dom (X) A FCV (T) = 0 and + M: T, then M[X ela, =a 
M[X5 e]a,. 

Based on this nature of context substitution, we may omit variable generators 
from context substitution applications. 
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M[X;G]a 
2[L;G]g = 2 
(AzT.M)[X; öle = ArT (Md; ö]a) 
(M N)[X;ö]a = (M[X; e]a) (N[X;ö]a) 
(quo(P) M)[¥; Ela = quo(I'[E; 4]) (MZ; (6, @, a)Je) 


where g = destructa (Ô, X) 
and G’ = G — (dom (Î) U rg(a’)) 
(unq, M[0])[ X; a] = ung, (M[2 6 f k]a)[0[4; a] c] 
(Ay.M)[X; a]@ = Ay.(M[X; a]a) if y dom (X) and y ¢ FCV (X) 
(M@C)[X3a]e = (M[X; a]c)@(C[X}) 


9[23 ola IZ; a] 
e[;a]g=e e[X; o] =e 

(0, M)\[X; ja = (O[2; öle), (M[X;ö]a) (Tx: DX; 0] = T|X [X] 
(02; ala), 7 o C 

(6,x)[3 ale = tee 2 E€ tailte). (Px: Esa] = if x:= Y € tail (5) 
(6[1;G]g),x otherwise and 3 =Cex 

I|X;a],x: y else 
(1, @)[23 a] =I [X30 + 1], a 


Auxiliary functions 


destructa ((T, x: T), X) = destructa (I, X) 
axa ify=Ce wD 
where @ = destructa (T, X) 
and X = gensymsg(C', dom (T) U rg (a)) 
destructa (I, X) otherwise 
destructg((I, @@), X) = destructa (T, X), @ 
gensyms;c, c.) (9°, V)=e 
gensymSq, ¢,)((C; T), V) = gensyms(g, e) (C, V U {z}), £ 
where z is the first element of G, such that x ¢ V 
gensymsg, a.) ((C, 7), V) = gensyms;c, e) (C; V U {x}),x 
where x is the first element of Gs such that x ¢ V 


destructg((I, x: y), X) = 


Fig. 7. Context substitutions and variable series substitutions 
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4.3 Local Soundness and Completeness 


Local soundness and local completeness are extended to polymorphic context 
types as follows. We use context substitution to obtain D’ in the local re- 
duction pattern. In this pattern, we observe destruct(I,y := C) = e because 
y € FCV (T), and hence we get [+ M[y := C;e]: T[y := C]. For the local 
expansion pattern, we have to pick a context variable 6 that is fresh against I’. 


Local Soundness 


D 
CTEM: T y € FCV (T) 
T F Ay.M:Yy.-T D' 


r- (Ay. M)Q@C: Th := C] = TEMP :=C;e¢]: TI = C] 


Local Completeness 
D' 
TEM:Vy.T 
D r- M@: Tly := ô] ô ¢ FCV (T) 
r- M:Yy.T = r- A48.(M@ô): Yô. (T [y = ô]) 


As a result, we obtain an additional reduction rule for +g below. 


(Ay.M)@C >, M|y = C;e| 


By using the substitution and context substitution lemmas, it is not hard to 
show subject reduction with regard to this G-reduction. 


Theorem 1 (Subject Reduction). 


1. If M: T and M >, M', then D+ M': T. 
2. If 0: C and0 >, 0, then FO: C. 


Furthermore, $-reduction satisfies strong normalization and confluence. We 
only refer to confluence here because we will prove strong normalization in the 
next section. 


Theorem 2 (Confluence). Jr- M: T, M > Nı and M >% No, then 
there exists a term N3 such that Ni >. N; and No 3 N3. The same holds also 
for well-typed explicit substitutions. 


Proof. We use Newmann’s lemma [25]. We have strong normalizaiton from The- 
orem 3 (in Section 5) and weak confluence is easy to show. 
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5 Parametric Reducibility and Strong Normalization 


This section provides a proof of strong normalization of -reduction in Ayp. 
A common approach to proving strong normalization of a modal calculus is 
to provide a reduction-preserving translation to another strongly normalizing 
calculus such as simply typed lambda calculi [15,1]. We tried this approach, 
reducing strong normalization of Ayj to that of System F [8]. However, it turned 
out not to be straightforward. Instead, we directly prove strong normalization of 
Ay] using reducibility in this paper. We follow Girard’s parametric reducibility [8] 
to define reducibility with polymorphic contexts. We also adopted techniques 
from logical relation for Fitch-style modal calculi proposed by Valliappan et 
al. [31] to extend reducibility to our Fitch-style modal type theory. Along with 
these existing methods, our approach requires several non-trivial extensions of 
reducibility for contextual modal types, which we detail in this section. 
We start with the definition of neutral terms and explicit substitutions. 


Definition 2 (Neutral Terms and Explicit Substitutions). 


1. A term M is neutral iff M is either of a variable, application, unquote, or 
context application. 
2. An explicit substitution 0 is neutral iff it can be derived from the rules below. 


0 is neutral M is neutral 0 is neutral 
e is neutral 0, M is neutral 0,x is neutral 


The definition of neutral terms is standard, while the one for neutral explicit 
substitutions is somewhat specific to Ayy but straightforward: @ is neutral iff all 
terms in 0 are neutral. Then, we define reducibility candidates. 


Definition 3 (Reducibility Candidates). Given a type T, let R be a set of 
derivable judgments of type T. We write R(T, M) iff TEAM: TER. Risa 
reducibility candidate of T iff it satisfies all of the following properties. 


CRO Jf R(T, M) and T <I", then R(I", M). 

CR1 If R(T, M), then M is strongly normalizing with regard to +z. 

CR2 If R(L,M) and M +, M', then R(T, M’). 

CR3 IfM is neutral, T- M: T, and R(T, M') for all M’ such that M >, M’, 
then R(T, M). 


We also define a reducibility candidate of context C similarly. 


We abbreviate reducibility candidate as RC. As a next step, we define re- 
ducibility candidate assignments to define reducibility with parameters. We only 
need to care about reducibility candidates of contexts because Ayp does not have 
polymorphic types. 


RC assignment X ::= è | £, y: C := R (where R is an RC of C) 
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» is well-formed if it does not have duplicating context variables in it. We 
assume that all reducibility candidate assignments are well-formed. We write 
dom (£) for the set of context variables on the left side of := in £, and X for 
the context substitution that we can obtain by forgetting RCs in 5. 

On top of that, we define reducibility with parameters. 


Definition 4 (Parametric Reducibility). Given an RC assignment €, a type 
T, and a contert C where FCV (T) C dom (£) and FCV (C) C dom (£), we 
define Redr|E] and Redc|E], a set of derivable judgments of a type T|X] 
and a context C[X], respectively, as follows. We write Red7[X|(I, M) iff T + 
M: T|X] € Red 7[2)]; similarly for Redo[=|(I, 0). 
— If T =., then Redr|X](T, M) iff M is strongly normalizing with regard to 
>g. 
- If T = Ti > To, then Red7[|(I, M) iff Red7,[X](A,M N) for any A 
and N such that I < A and Redr, |5] (A, N). 
— If T =[|C H T’, then Red7[S|(I, M) iff Red 7 [X](A’, unq, M[9]) for any 
A, A’, k and0 such that T < A, k: A < A and Redg[X](J’, 6). 
— If T =Vy.T"', then Red 7[X)(L, M) iff Readr [£,y: C = R](T, M@C) for 
any C and an RC R of C. 
— If C =e, then Redo[S|(I,9) always holds (where 0 is always e). 
— If C = C',T, then Redg[5](L, 0) iff Redo: [Z](L, 6’) and Red 7[](I, M) 
where 0 = 0', M. 
— If C = C',7, then Redc[2](L, 9) iff Redo [Z](L, 01) and R(T, 62) for some 
01, 82, and R such that 0 = 01,02 andy: D=R € a 


The definition for context variables is somewhat complicated. As (C’, y)[2] = 
C'[X], D, we need two reducible explicit substitutions 6, and 62 where 6, is for 
C'[X] and 62 for D. Because D comes from the context variable y, we use the 
RC R from X to confirm that 0z is reducible. 

The parametric reducibility is a reducibility candidate in fact, stated as the 
following lemma. 


Lemma 5. 1. Red7[»] is an RC of T. 


2. Redg[5] is an RC of C. 


We prove a few more auxiliary lemmas for the basic lemma. Firstly, we con- 
firm that context substitution on types or context can be lifted to reducibility 
assignment. 


Lemma 6. 1. Red 7,y:=c] [5] = Red 7[X,y: C[Z] = Redc[5]]. 
2. Red pjy:=c[~] = Redp[X, y: C[Z] = Redo|5]]. 


Besides, we state three lemmas that correspond to introduction of function 
types, contextual modal types, and polymorphic context types. 


Lemma 7. [f I, a: S[X|t M: T|X] and Red 7[5](I", Mļidr, x = N)) for any 
I’ and N such that I < I’ and Redgs[|(I’, N), then Reds r[X] (T, \r°.M). 
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Lemma 8. If T,@, 2: C[X] + M: T|X] and Red7[](I2, M[idp, ð, ? := 
6]) for any Ti, Ia, k and @ such that I < D, k: Di < T> and Rede|] (I>, 0), 
then Rediop 1)[](L, quo(a: C[L])M). 


Lemma 9. If rT + M: T|X], y ¢ FCV(T) U FCV(X) U dom(2), and 
Redr[X, y: C = R](T, M|y = C;e]) for any C, R such that R is an RC 
of C, then Redy,.7[](I, Ay.M). 


We can prove these lemmas by CR3 and induction on the number of reduction 
steps of strongly normalizing terms/explicit substitutions. 

Before the basic lemma, we define reducibility for named contexts. Although 
we would like something like Redr[X], this definition does not work because 
it does not have information on how a named context with series variable x: y 
will be replaced. Therefore we also need to pass series variables substitution, like 
Red_({5’,a] in the same way as context substitution for named contexts. 


Definition 5 (Reducibility for Substitution). Given an RC assignment $, 
a named context T’, and a series substitution g where FCV (I) C dom (X J ), we de- 
fine Red[&’, a], a set of derivable judgments of a named context + o: T|X; a] > 
A, as follows. We write Redr|¥,a](4,0) iff o: A =T € Redp[, a]. 


— IfT =e, then Redp[2,a](A,o) always holds (where o = e). 

- If T = T's: T, then Redp[X,a]|(A,o) iff Redp[X,a](A,o’) and 
Red7[5(A, M) for some o', M such that o = (o', x := M). 

— If T =T"',x: y, then Redr|5, (4,0) iff Redr[£, 5 (4,0) and R(4,0) 
for some o', 0 and R such that y: C =R E€ X, o = (o, È := 0) and 
y= PEG (xa = 0 is a point-wise mapping between X and 0). 

— If T = T', Â, then Redp[,a](A,o) iff Redr:[£,5](A t k,o’) for some o' 
and k such that o = (o',@;). 


We use series variables substitution in the third rule to generate a substitution 
for (x: y) X; ] = X: C. Finally, we prove the basic lemma. 


Lemma 10 (Basic Lemma). 


- fT + M: T and Redp[X,a|(A,o’) where © = destruct(I’,X), then 
Red 7[3](A, M[Z;4][o']). 

—-iff + @:C one Red7[',a|(A,o’) where © = destruct(I’,Z), then 
Redo[5](A, 6[2; a][o’]). 


Strong normalization is proved as a special case of the basic lemma, where 
we choose X, g and o’ as identity substitutions respectively. 


Theorem 3 (Strong Normalization). If [+ M: T, then M is strongly nor- 
malizing with regard to +. 
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Level-0 Types T°,S° = | 9? > T? | OTt 

Level-0 Terms M°, N° =z | AzT°. M? | M? N° | quoM? 
Level-1 Types TE :=.1|s'oT' 

Level-1 Terms Mt, N! = z | ArT .M! | M? N+ | unqM° 
Named Contexts I°, A° =. | T°, æ :® T° | T°, x :* T! 


T° h; M’: T’ | (i € {0,1} 


r İT Eer T°, x$ Tİ Hi M*: Th T° h; MË: TË > TË T° H; NË: TË 
Dryer" feb je MYT > T T? Fi MN T3 
T° H M!: T! T? Fo M°: OT' 
T? Ho quoM!: OT! T° Hı ungM°’: T! 


Fig. 8. Syntax and typing rules of Ao (two-level fragment) 


6 Embedding Linear-Time Temporal Type Theory 


In multi-stage computation, contextual modal types are known to overcome weak 
points of linear-time temporal types from Ao by Davies [5], regarding type safety 
of mutable reference cells and/or run-time code evaluation [12,24,14]. However, 
simple contextual modal theories, such as Aj, are known to be less expressive 
than linear-time temporal types. That is why polymorphic contexts are explored 
in the literature, which will endow expressiveness to contextual modal types. 
Then it is natural to ask if polymorphic contexts are strong enough to express 
linear-time temporal types. This section proves that the answer is yes, by provid- 
ing a sound translation from linear-time temporal types to Ayp. We first define 
a two-level fragment of AG, as a source language to simplify our embedding 
(Fig. 8). We call the fragment itself Ao later in this paper. Then, we discuss the 
core insights of our embedding from Ac and give a formal definition of our em- 
bedding from ào to Ayp. We also prove its soundness—the embedding preserves 
typing—while a proof that it also preserves semantics is left for future work. 

Ao has two stages: level-1 is the future stage. We define types and terms 
for each level (and metavariables are indexed by 0 or 1). A temporal type OT! 
denotes a code for the future-stage value of T!. Unlike contextual modal types, 
temporal types do not show context explicitly. Instead, typing judgments hold 
future-stage named contexts that implicitly represent contexts of those code 
types. A type judgment T° +; Mt: Ti (where i = 0,1) means typing at the 
stage i, where [°° includes variables of both levels. Ao also has syntax for quote 
and unquote as in Ayj but they are not annotated with named contexts and 
explicit substitutions. Typing rules do little with named contexts. 

These differences lead to the difference in binding structure. For example, 
consider a Ag-term A fOTi SOT: .quo(AxZt .unq( f quoz)). In this term, the outer 
lambda binds the level-0 occurrence of f and the inner lambda binds the level-1 
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occurrence of x, although quo and unq are placed between binders and variable 
references. To embed Ao to Ayp, we have to emulate this behavior of Ao. 

We design our embedding from Ao to Ayp based on the following insights. 
First of all, we naturally embed quote and unquote of Ao to those of Ayy (by 
recovering missing annotations). Secondly, we can recover a hidden context of 
code types in Ag from the types of level-1 free variables. For example, in the 
judgment 

z :° Oint, y :' int Fo quoy: © int, 
the context of the type Cint (of quoy) should be int because the named context 
has a level-1 binding y :' int. As a result, Oint under x :° Ojint,y :! int is 
embedded into [int | int]. Thirdly, recovered contexts of code types sometimes 
need to be extended. Let us consider the following judgment: 


Fo A fO Ostr quo(Ar™ .ung(f quor)) : (Oint > Ostr) > O(int > str). 


The hidden context of the f is empty, and hence the type of f should be [e + 
int] > [e H str]. However, f is used inside the level-1 binder Ax*”’, and hence 
this use of f should be typed as [int F str] — [int H str]. We need to extend the 
context of the code type as an abstraction under quo extends the level-1 context. 
Thus, the polymorphic context type Vy.[y F int] + [y str] is more appropriate 
for f. In this way, polymorphic contexts allow us to extend the context of an 
argument of code type, according to where the argument is used. 

The formal definition of our embedding is shown in Figure 9. Level-1 types 
are translated to Ayj types in a straightforward manner; the translation of level- 
0 types carries a context, which is used to signify the context of code types. If 
it translates a function type, we introduce a polymorphic context type to the 
argument type so that we can extend the context of the type later. For example, 
(Cint > Ostr) > Clint > str) translates to (Vy.(V6.[y,6 H int]) > [y F 
str]) > [ə F int > str] under an empty context. 

Before discussing term translation, we introduce intermediate named contexts 
À , an intermediate representation of embedded named contexts. Their structure 
is similar to named contexts in Ao while its elements are variables and types 
of Ay}. We write |I'\o for the level-0 fragment of I and ||, for the level-1 


a of I’. The relation T° ~» I means that T° can be translated into 

. The point is that I° can be translated into different intermediate named 
en For example, the AG named context x :1 T',y :° OSt,z :° Os! 
can be translated to both z :1 [TJ], y :° [[T*] A [St], z :° [[T1] + [St] and 
zi [TY], y 2° [T1] + [S1], x sy, z :° [T+], y + [S1] due to the last rule of ~. 
We use this relation to prove the soundness theorem (Theorem 4) later. 

Term embedding carries an intermediate named context for two purposes. 
Firstly, it is used to infer a named context and an explicit substitution for quote 
and unquote. Secondly, it is used to know a missing context that we need to 
extend when using level-0 variables. The level-1 types in a named context always 
translate to polymorphic context types so that we can extend their context when 
those variables are used. diff (x, I) determines the missing context, defined as 
diff (x, (T, x :° T, A)) = rg(|A],) (or undefined otherwise). 
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[T'I (MTs [Thc || IMI - 
[e] h LIC 5L 
IT} > 7] = [TH] > ITH] [TP > Tele = Wy.IT] c+) > IT?]e 
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Pa? Mp = As TIM Tp sagr fAs. M] = 0? [M1 p07 
1 i 1 1 
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[quoM *] = quo(|I"|1)[M "Tp 


Intermediate Named Context [:=-|I,2:° T| 2,2: T| Px: 7 
PED TET 
w3 a 
r°, are T? ~~ Re Yy- ledh), 
rai ro L 
T°, x: T! ~ Ñ, e:t [TY] Pew Tix y 


Fig. 9. Embedding from Ao 


Finally, we prove the soundness of the translation. 
Theorem 4 (Soundness of Embedding from ào). 


— If T° Fo M°: T° and I° ~ I, then Ilo F [M°]z: [T° Jeqfhy 
— If T° Hı M!: T! and T° ~ TL, then |I |o, l, |D] H [M1] 5: [Tt]. 


Proof (Sketch). By mutual induction on derivation of Ao. 

We focus on the case of level-0 application. If M° = M? M9, then T° Fo 
M?: S° + T° and T° Fo MÌ: S® for some S°. By the induction hypothesis, we 
have the two Ayj judgments below. 


- |PloF [MP] F: (Vy. IS Vets) vy ) > [T°] 
F |D x: : lo F [M9] Ft Ly? [s° le (\P,x:1y|1) 


The second judgment holds because Z° ~» Dx: y can be derived from [° ~~ Î. 
We can derive |I lo F Ay. [M3]; x1: VIS. ]edf 1yh) from the second judg- 


rg(lĎl1) 


ment considering that |, z :! y\o = |I lo. Then we can apply this judgment to 
the first judgment, and we obtain |I'Jo F [MP]; (A7-[M3] F x13): IT learn) m 
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It is worth noting that this embedding requires multiple occurrences of con- 
text variables in a single context: As we have seen, (Oint > Ostr) > O(int > 
str) translates to (Vy.(V6.[y,6 H int]) > [y F str]) > [e F int — str], where 
the type [7,6 H int] uses two context variables. This fact strongly suggests that 
context variables in Ay are essential for embedding linear-time temporal types 
and hence also staged computation. 


7 Related Work 


Contextual Modal Type Theory. Early work on calculi for metaprogramming 
with explicit contexts include \?°!% by Kim et al. [12] and vH by Nanevski and 
Pfenning [16]. On the one hand, \?°!" has a Fitch-style-like modal type system 
with explicit contexts and is type safe in the presence of mutable reference and 
run-time evaluation. On the other hand, v~ has a dual-context-like modal type 
system that is type sound with run-time evaluation. Both calculi use symbolic 
representation for named contexts of quoted code. As a result, names in quoted 
code are not subject to a-conversion. It is worth noting that both papers discuss 
context polymorphism to achieve flexibility for computation with contexts. 

Nanevski et al. refined v= to contextual modal type theory (CMTT) [17], 
allowing a-conversion for variables in quoted code. CMTT is very close to our 
Aj while it employs dual-context style formulation. We believe it is not difficult 
to apply polymorphic context types to dual-context CMTT, although we do 
not explore it in this paper. CMTT provides a basis for several metaprogram- 
ming languages [9,20,26]. We expect that Ayp will contribute to future designs 
of metaprogramming languages as well. 

One notable difference between CMTT and Aj is that CMTT has a named 
context inside a contextual modal type, instead of an (unnamed) context. This 
approach makes a-conversion somewhat complicated: a CMTT term box(a: T.x) 
has a type [x: T]T while an a-equivalent term boxr(y: T.y) has a bit different 
type [y: T]T. Instead, Aj omits names from contexts in contextual modal types 
by identifying variables in a context by their positions; hence a-equivalent terms 
always have the same type in Aj. 


Prior Work on Polymorphic Contexts. Contextual modal type systems have 
been applied to proof assistants [20,3,21,26]. Those proof assistants are designed 
to allow users to inspect code representation of proof terms using contextual 
modal types. In particular, Beluga [20,3] allows users to perform pattern match 
against code with polymorphic contexts, whereas Ayj allows only for genera- 
tive metaprogramming. The prior proposals used an identity substitution idg 
as a term representation of a context variable ¢, whereas we use series vari- 
ables for that purpose. Type-theoretic formalization of identity substitutions 
is examined by Puech’s unpublished work [23]. He proposed dual-context and 
Fitch-style contextual modal type theories with polymorphic types and iden- 
tity substitutions. However, a formalization with identity substitutions intro- 
duces a significant restriction: only one occurrence of context variable is al- 
lowed in a single context. Suppose we allow multiple occurrences of context 
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variables in a context with identity substitutions. In that case, we have a term 
like quo(y, 7) (unq(z)[id,]) that is ill-scoped because we do not know which y 
is referred to by id}. One might consider introducing a restriction that context 
variable do not duplicate in a context. However, it is still hard to avoid ill-scoped 
terms like (Ad.quo(y, ô) (unq(zx)[id,]))@y, which reduces to the previous term. 
That is why we introduce series variables in Ayj. 


Context Subtyping. Rhiger [24] proposed a Fitch-style contextual modal type 
system aL that achieves safe code operation with mutable reference and run- 


time evaluation. An interesting point of aL is that it employs linear-time fla- 
vored named contexts where a quote does not discard a future-stage context, and 
achieves flexibility of computation with context by introducing structural sub- 
typing for contexts. Kiselyov et al. proposed a type system <NJ> with a notion 
of refined environment classifiers [14], which can be interpreted as encapsulated 


representation of contexts. <NJ> is similar to aL in the sense that it employs clas- 
sifier subtyping while it is closer to nominal subtyping. They suggested bounded 
polymorphism over classifiers as potential extension of <NJ>, which will allow 
a type like Yy.(Vő > y.(T1)? —> (Tə)ê) —> (Tı > Tə)”. Their bounded poly- 
morphism is likely as expressive as polymorphic contexts of yj, and we are 
interested in the formal relation between them. 


Pattern matching against code Analytic metaprogramming that allows pat- 
tern matching against code values is considered beneficial and explored re- 
cently [18,28,9]. Especially, Moebius [9] provides a contextual modal type system 
capable of pattern matching against open code with polymorphic types. It should 
be feasible to extend Ayy to allow pattern matching against code values, but it 
is left for future work. 


Modal Types for Algebraic Effects and Handlers. ECMTT [32] is an interesting 
application of contextual modal types to algebraic effects and handlers [22]. It 
uses contexts to track effects of computations and use explicit substitutions to 
supply effect handlers. The authors mentioned that ECMTT needs some form 
of context polymorphism to support effect polymorphism. We expect the poly- 
morphic context types in Ayj will provide a basis for such an extension. As our 
formulation allows multiple occurrences of context variables; hence, we can de- 
scribe a function that combines computations with different polymorphic effects, 
e.g., Vy, ô. F T] > [6+ T] > [9,0 + T]. 


Linear-Time Temporal Types. There are several attempts at revealing the rela- 
tion between contextual modal type theory and linear-time temporal type theory. 
However, not all of them achieved their goal. For example, Davies |5] pointed 
out that the translation from AZolY, to Ao, proposed by Kim et al. [12], was not 
sound for some cases. Puech [23] also claimed a sound translation from AẸ#” to 
A® [29], which is an extension of Ao with environment classifiers, but it did not 
work for some cases, either. His translation infers hidden contexts by introduc- 
ing logic variables for unknown contexts and collecting constraints on those logic 
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variables through typing derivations. Consequently, the following judgment fails 
to translate because f is used in two different scopes, and hence contradicting 
constraints for f is generated. 


fPOrSOL¢? OT > OT > Ole tT 
F- g (quo((Az : T.ung(fquoz))z)(fquoz)) 


These failing translations conversely indicate that the hypothesis by Davies [5] 
is right: a sound translation from Ac requires a full form of context polymor- 
phism as in our Ayj. Kameyama et al. [10] provided a sound translation from 
a 2-level fragment of A% to System F with products and a fixed point opera- 
tor. Their translation uses polymorphic types to represent unknown contexts, 
similarly to our approach. However, their translation takes an approach dif- 
ferent from ours. For example, a Ag type OT — OT — OT is encoded to 
vy. (ly F T] > V6.([7,6 F T] => [7,6 T))) if we apply their approach to Ayp, 
whereas the same type is encoded to (Vy.[y F T]) > (Vy.[y - T]) > [e F T] 
by the approach discussed in Section 6. There are two major differences be- 
tween their approach and ours. Firstly, their translation needs to insert coercion 
functions that extend contexts in types in conjunction with polymorphic types. 
On the contrary, our approach achieves the same goal purely by polymorphic 
contexts, making the translation much more concise. Secondly, their source lan- 
guage supports richer expressions than Ao, including run-time evaluation and 
fixpoint. It is left for future work to figure out whether our approach can also 
embed such features of A% to Ayp. 


8 Conclusion 


This paper has proposed a novel contextual modal type theory Ayp with poly- 
morphic contexts. It is novel in that it supports parametric polymorphic contexts 
and allows us to have multiple context variables in a single context. We have 
given its semantics by (-reduction and proved subject reduction, strong nor- 
malization, and confluence. We have also demonstrated sound embedding from 
linear-time temporal type theory. We expect that this result shows that Ayp 
endows expressiveness sufficient to describe programs with staged computation. 

We regard this work as a first step to establishing a mature modal type the- 
ory that reasons hygienic binding operations provided by procedural macros of 
Scheme, Racket, and several languages. Future work includes formal reasoning 
of the relation between contextual modal types and refined environment clas- 
sifiers and developing contextual modal type theory that can express first-class 
variable names. 
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Abstract. Guarded Kleene Algebra with Tests (GKAT) is a fragment 
of Kleene Algebra with Tests (KAT) that was recently introduced to 
reason efficiently about imperative programs. In contrast to KAT, GKAT 
does not have an algebraic axiomatization, but relies on an analogue of 
Salomaa’s axiomatization of Kleene Algebra. In this paper, we present an 
algebraic axiomatization and prove two completeness results for a large 
fragment of GKAT consisting of skip-free programs. 


1 Introduction 


Kleene algebra with tests (KAT) [25] is a logic for reasoning about semantics and 
equivalence of simple imperative programs. It extends Kleene Algebra (KA) with 
Boolean control flow, which enables encoding of conditionals and while loops. 

KAT has been applied to verification tasks. For example, it was used in proof- 
carrying Java programs [23], in compiler optimization [27], and file systems [8]. 
More recently, KAT was used for reasoning about packet-switched networks, 
serving as a core to NetKAT [4] and Probabilistic NetKAT [12,43]. 

The success of KAT in networking is partly due to its dual nature: it can be 
used to both specify and verify network properties. Moreover, the implementa- 
tions of NetKAT and ProbNetKAT were surprisingly competitive with state-of- 
the-art tools [13,44]. Part of the surprise with the efficiency of these implemen- 
tations is that the decision problem for equivalence in both KAT and NetKAT 
is PSPACE-complete [28,4]. Further investigations [42] revealed that the tasks 
performed in NetKAT only make use of a fragment of KAT. It turns out that 
the difficulty of deciding equivalence in KAT can largely be attributed to the 
non-deterministic nature of KAT programs. If one restricts to KAT programs 
that operate deterministically with respect to Boolean control flow, the associ- 
ated decision problem is almost linear. This fragment of KAT was first identified 
in [29] and further explored as guarded Kleene algebra with tests (GKAT) [42]. 

The study in [42] proved that the decision problem for GKAT programs is 
almost linear, and proposed an axiomatization of equivalence. However, the ax- 
iomatization suffered from a serious drawback: it included a powerful uniqueness 
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of solutions axiom (UA), which greatly encumbers algebraic reasoning in prac- 
tice. In order to use (UA) to show that a pair of programs are equivalent, one 
needs to find a system of equations that they both satisfy. Even more worry- 
ingly, the axiomatization contained a fixed-point axiom with a side condition 
reminiscent of Salomaa’s axiomatization for regular expressions, which is known 
to be non-algebraic and impair the use of the axiomatic reasoning in context (as 
substitution of atomic programs is not sound anymore). The authors of [42] left 
as open questions whether (UA) can be derived from the other GKAT axioms and 
whether the non-algebraic side condition can be removed. Despite the attention 
GKAT has received in recent literature [39,48,41], these questions remain open. 


In the present work, we offer a partial answer to the questions posed in [42]. 
We show that proving the validity of an equivalence in GKAT does not require 
(UA) if the pair of programs in question are of a particular form, what we call 
skip-free. This fragment of GKAT is expressive enough to capture a large class 
of programs, and it also provides a better basis for algebraic reasoning: we show 
that the side condition of the fixed-point axiom can be removed. Our inspiration 
to look at this fragment came from recent work of Grabmayer and Fokkink’s on 
the axiomatization of 1-free star expressions modulo bisimulation [15,14], an im- 
portant stepping stone to solving a decades-open problem posed by Milner [32]. 

In a nutshell, our contribution is to identify a large fragment of GKAT, what 
we call the skip-free fragment, that admits an algebraic axiomatization. We ax- 
iomatize both bisimilarity and language semantics and provide two completeness 
proofs. The first proves completeness of skip-free GKAT modulo bisimulation [39], 
via a reduction to completeness of Grabmayer and Fokkink’s system [15]. The 
second proves completeness of skip-free GKAT w.r.t. language semantics via a 
reduction to skip-free GKAT modulo bisimulation. We also show that equivalence 
proofs of skip-free GKAT expressions (for both semantics) embed in full GKAT. 

The next section contains an introduction to GKAT and an overview of the 
open problems we tackle in the technical sections of the paper. 


2 Overview 


In this section we provide an overview of our results. We start with a motivating 
example of two imperative programs to discuss program equivalence as a verifi- 
cation technology. We then show how GKAT can be used to solve this problem 
and explore the open questions that we tackle in this paper. 


Equivalence for Verification. In the game Fizz! Buzz! [35], players sit in 
a circle taking turns counting up from one. Instead of saying any number that 
is a multiple of 3, players must say “fizz”, and multiples of 5 are replaced with 
“buzz”. If the number is a multiple both 3 and 5, the player must say “fizz buzz”. 

Imagine you are asked in a job interview to write a program that prints out 
the first 100 rounds of a perfect game of Fizz! Buzz!. You write the function 
fizzbuzzl as given in Figure 1(i). Thinking about the interview later that day, 
you look up a solution, and you find fizzbuzz2, depicted in Figure 1(ii). You 
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def fizzbuzz1 = (i) 
al 
while n < 100 do 
if 3|n then 
if not 5|n then 
print fizz; n++; 


def fizzbuzz2 = (ii) 
a= l; 
while n < 100 do 
if 5|n and 3|n then 
print fizzbuzz; 
else if 3|n then 
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else print fizz; 
print fizzbuzz; n++; else if 5|n then 
else if 5|n then print buzz; 
print buzz; n++; else 
else print n; 
print n; n++; n++; 
print done!; print done!; 


Fig. 1. Two possible specifications of the ideal Fizz! Buzz! player. 


suspect that fizzbuzz2 should do the same thing as fizzbuzz1, and after thinking 
it over for a few minutes, you realize your program could be transformed into the 
reference solution by a series of transformations that do not change its semantics: 


1. Place the common action n++ at the end of the loop. 
2. Replace not 5|n with 5|n and swap print fizz with print fizzbuzz. 
3. Merge the nested branches of 3|n and 5|n into one. 


Feeling somewhat more reassured, you ponder the three steps above. It seems 
like their validity is independent of the actual tests and actions performed by the 
code; for example, swapping the branches of an if - then - else - block while negat- 
ing the test should be valid under any circumstances. This raises the question: 
is there a family of primitive transformations that can be used to derive valid 
ways of rearranging imperative programs? Furthermore, is there an algorithm to 
decide whether two programs are equivalent under these laws? 


Enter GKAT. Guarded Kleene Algebra with Tests (GKAT) [42] has been pro- 
posed as a way of answering the questions above. Expressions in the language 
of GKAT model skeletons of imperative programs, where the exact meaning of 
tests and actions is abstracted. The laws of GKAT correspond to program trans- 
formations that are valid regardless of the semantics of tests and actions. 

Formally, GKAT expressions are captured by a two-level grammar, generated 
by a finite set of tests T and a finite set of actions X, as follows: 


BExp3b,c:=O0|1|/teT|bVc|bAc|b 
GExp >e, f :=pE X|b|e+ fle- f| e® 


BExp is the set of Boolean expressions, built from 0 (false), 1 (true), and primitive 
tests from T, and composed using V (or), A (and) and ~ (not). GExp is the set 
of GKAT expressions, built from tests (assert statements) and primitive actions 
p € X. Here, e +, f is a condensed way of writing ‘if b then e else f', and e() 
is shorthand for ‘while b do e’; the operator - models sequential composition. By 
convention, the sequence operator - takes precedence over the operator +p. 
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Example 2.1. Abbreviating statements of the form print foo by simply writing 
foo, Figure 1(i) can be rendered as the GKAT expression 

(n < 100) 
(n:=1)- (fizz - n++ tsin fizzbuzz : n*++) +3)n - done! (1) 
(buzz: n++-+sja n+ n++) 


Similarly, the program in Figure 1(ii) gives the GKAT expression 
(n := 1): ((fizzbuzz +5in ^ 3ļn (fizz +3Įn (buzz +5in n))) -nt+)(” S 100) . done! (2) 


Semantics. A moment ago, we stated that GKAT equivalences are intended to 
witness program equivalence, regardless of how primitive tests and actions are 
interpreted. We make this more precise by recalling the relational semantics of 
GKAT programs [42]. The intuition behind this semantics is that if the possible 
states of the machine being programmed are modelled by some set S, then tests 
are predicates on S (comprised of all states where the test succeeds), and actions 
are relations on S (encoding the changes in state affected by the action). 


Definition 2.2 ([42]). A (relational) interpretation is a triple o = (S, eval, sat) 
where S is a set, eval: X —> P(S x S) and sat : T > P(S). Each relational 


interpretation o gives rise to a semantics |-]o : GExp > P(S x S), as follows: 
[oO]. =9 [a]- = [Ho \ lalo 
[1J- = {(s,8):5€ S} [plo = eval(p) 
[t]- = {(s, 5) : s € sat(t)} [ets flo = fiblo © lelo U blo ° [flo 
[oA clo = folo 9 [clo le: flo = lelo olf] 
lb v elo = fole U felo [e Jo = (flo © lelo)“ © [blo 


Here we use o for relation composition and * for reflexive transitive closure. 


Remark 2.8. If eval(p) is a partial function for every p € XY, then so is |e], for 
each e. The above therefore also yields a semantics in terms of partial functions. 


The relation fe], contains the possible pairs of start and end states of the 
program e. For instance, the input-output relation of Je +, f] consists of the 
pairs in [e]. (resp. [f].) where the start state satisfies b (resp. violates b). 


Example 2.4. We could model the states of the machine running Fizz! Buzz! as 
pairs (m, £), where m is the current value of the counter n, and £ is a list of 
words printed so far; the accompanying maps sat and eval are given by: 


sat(k|n) = {(m, £) € S : m = 0 mod k} 
sat(n < k) = {(m,l) E S:m<k} 
eval(n++) = {((m, £), (m+ 1, £) : (m, £) € S} 
eval(n := k) = {((m, £), (k, £) : (m, 8) € 3)} 
eval(w) = {((m, £), (m, £w)) : (m, £) € S} (w € {fizz, buzz, fizzbuzz}) 
eval(n) = {((m, £), (m, £m)) : (m, £) € S} 


5 A probabilistic semantics in terms of sub-Markov kernels is also possible [42]. 
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For instance, the interpretation of n++ connects states of the form (m, £) to states 
of the form (m + 1, 2)—incrementing the counter by one, and leaving the output 
unchanged. Similarly, print statements append the given string to the output. 


On the one hand, this parameterized semantics shows that programs in the 
GKAT syntax can be given a semantics that corresponds to the intended meaning 
of their actions and tests. On the other hand, it allows us to quantify over all 
possible interpretations, and thus abstract from the meaning of the primitives. 

As it happens, two expressions have the same relational semantics under any 
interpretation if and only if they have the same language semantics [42], i.e., in 
terms of languages of guarded strings as used in KAT [25]. Since equivalence un- 
der the language semantics is efficiently decidable [42], so is equivalence under 
all relational interpretations. The decision procedure in [42] uses bisimulation 
and known results from automata theory. These techniques are good for mecha- 
nization but hide the algebraic structure of programs that plays. To expose this, 
algebraic laws of GKAT program equivalence were studied. 


Program transformations. GKAT programs are (generalized) regular expres- 
sions, which are intuitive to reason about and for which many syntactic equiv- 
alences are known and explored. In [42], a set of sound axioms e = f such that 
[elo =[f]. for all o was proposed, and it was shown that these can be used to 
prove a number of useful facts about programs. For instance, the following two 
equivalences are axioms of GKAT: 


e- g+» fp = lete 7) eg ftpe=ertof 


The first of these says that common code at the tail end of branches can be 
factored out, while the second says that the code in branches of a conditional 
can be swapped, as long as we negate the test. Returning to our running example, 
if we apply the first law to (1) three times (once for each guarded choice), 


: ; (n < 100) 
fizzbuzz +5)n fizz) +3) 
is ((' ee ea r : ) nts) ee i 


Finally, we can apply (e +» f) +e (g +b h) = e t+onc (f +c (g +5 h)), which is 
provable from the axioms of GKAT, to transform (3) into (2). 

Being able to transform one GKAT program into another using the axioms of 
GKAT is useful, but the question arises: do the axioms capture all equivalences 
that hold? More specifically, are the axioms of GKAT powerful enough to prove 
that e = f whenever [e], = [fl]. holds for all o? 

In [42], a partial answer to the above question is provided: if we extend the 
laws of GKAT with the uniqueness aziom (UA), then the resulting set of axioms 
is sound and complete w.r.t. the language semantics. The problem with this is 
that (UA) is not really a single axiom, but rather an axiom scheme, which makes 
both its presentation and application somewhat unwieldy. 

To properly introduce (UA), we need the following notion. 
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Definition 2.5. A left-affine system is defined by expressions €11,---,€nn € 
GExp and fi,..., fn € GExp, along with tests b11,..., bnn € BExp. A sequence 
of expressions 81,..., 8, E GExp is said to be a solution to this system if 


Si = eil + 81 +b C82 * 82 tb °t bimn- Cin thin Ji (Wi < 0) 


Here, the operations +5,, associate to the right. 

A left-affine system is called guarded if no ei; that appears in the system 
successfully terminates after reading an atomic test. In other words, each coeffi- 
cient denotes a productive program, meaning it must execute some action before 
successfully terminating—we refer to Section 7.8 for more details. 


Stated fully, (UA) says that if expressions s1,...,5, and t),...,t» are solutions 
to the same guarded left-affine system, then s; = t; for 1 <i<n. 

On top of the infinitary nature of (UA), the side condition demanding guard- 
edness prevents purely algebraic reasoning: replacing action symbols in a valid 
GKAT equation with arbitrary GKAT expressions might yield an invalid equa- 
tion! The situation is analogous to the empty word property used by Salomaa [37] 
to axiomatize equivalence of regular expressions. The side condition of guarded- 
ness appearing in (UA) is inherited from another axiom of GKAT, the fixed-point 
axiom, which in essence is the unary version of this axiom scheme and explicitly 
defines the solution of one guarded left-affine equation as a while loop. 


g = eg +» f => g = e® f if e is guarded. 


Remark 2.6. Part of the problem of the uniqueness axiom is that the case for 
general n does not seem to follow easily from the case where n = 1. The problem 
here is that, unlike the analogous situation for Kleene algebra, there is no general 
method to transform a left-affine system with n + 1 unknowns into one with n 
unknowns [29], even if this is possible in certain cases [42]. 


The open questions. We are motivated by two open questions from [42]: 


— First, can the uniqueness axiom be eliminated? The other axioms of GKAT 
contain the instantiation of (UA) for n = 1, which has so far been sufficient 
in all handwritten proofs of equivalence that we know. Yet (UA) seems to be 
necessary in both known completeness proofs. 

— Second, can we eliminate the guardedness side condition? Kozen [24] showed 
that Salomaa’s axiomatization is subsumed by a set of axioms that together 
imply existence and uniqueness of least solutions to systems of equations, 
but this approach has not yet borne fruit in GKAT. 


This paper. Our main contribution is to show that, in a particular fragment 
of GKAT, both questions can be answered in the positive (see Figure 2). 

In Section 3, we present what we call the skip-free fragment of GKAT, con- 
sisting of programs that do not contain assert statements in the body (other than 
assert false); in other words, Boolean statements are restricted to control state- 
ments. For this fragment, we show that the axiom scheme (UA) can be avoided 
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Guarded Union Sequencing Loops 
i sb cr ey = x(a y) +o y 
c=atiy zo 2 9 
£ +o Y = y +g x x(yz) = (xy)z ae 
x +b (Y +c z) = (£ +0 y) Hove 2 (a +» y)z = £z +b yz E= Y 


Fig. 2. Axioms for language semantics skip-free GKAT (in addition to Boolean algebra 
axioms for tests, see Fig. 3). If the axiom marked j{ is omitted the above axiomatize a 
finer semantics, bisimilarity. 


entirely. In fact, this is true for language semantics (as first introduced in [42]) 
as well as for the bisimulation semantics of [39]. 

In Section 4, we provide a bridge to a recent result in process algebra. In the 
80s, Milner offered an alternative interpretation of regular expressions [32], as 
what he called star behaviours. Based on work of Salomaa from the 1960s [37], 
Milner proposed a sound axiomatization of the algebra of star behaviours, but left 
completeness an open problem. After 38 years, it was recently solved by Clemens 
Grabmayer [14] following up on his joint work with Wan Fokkink showing that 
a suitable restriction of Milner’s axioms is complete for the one-free fragment 
of regular expressions modulo bisimulation [15]. We leverage their work with an 
interesting embedding of skip-free GKAT into the one-free regular expressions. 

This leads to two completeness results. In Section 5, we start by focusing 
on the bisimulation semantics of the skip-free fragment, and then in Section 6 
expand our argument to its language semantics. More precisely, we first provide 
a reduction of the completeness of skip-free GKAT up to bisimulation to the 
completeness of Grabmayer and Fokkink’s 1-free regular expressions modulo 
bisimulation [15]. We then provide a reduction of the completeness of skip-free 
GKAT modulo language semantics to the completeness of skip-free GKAT modulo 
bisimulation via a technique inspired by the tree pruning approach of [39]. 

Finally, in Section 7, we connect our semantics of skip-free GKAT expressions 
to the established semantics of full GKAT. We also connect the syntactic proofs 
between skip-free GKAT expressions in both our axiomatization and the existing 
one. In conjunction with the results of Sections 5 and 6, the results in Section 7 
make a significant step towards answering the question of whether the axioms 
of GKAT give a complete description of program equivalence, in the positive. 

Proofs appear in the full version [40]. 


3 Introducing Skip-free GKAT 


The axiom scheme (UA) can be avoided entirely in a certain fragment of GKAT, 
both for determining bisimilarity and language equivalence. In this section, we 
give a formal description of the expressions in this fragment and their semantics. 


Skip-free expressions. The fragment of GKAT in focus is the one that excludes 
sub-programs that may accept immediately, without performing any action. Since 
these programs can be “skipped” under certain conditions, we call the fragment 
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Fig. 3. The axioms of Boolean algebra [18]. 


that avoids them skip-free. Among others, it prohibits sub-programs of the form 
assert b for b Æ false, but also while false do p, which is equivalent to assert true. 


Definition 3.1. Given a set X of atomic actions, the set GExp of skip-free 
GKAT expressions is given by the grammar 


GExp” 3 e1,€2 := 0 | p E€ X | e1 + €2 | €1 -e2 | eP ez 
where b ranges over the Boolean algebra expressions BExp. 


Unlike full GKAT, in skip-free GKAT the loop construct is treated as a binary 
operation, analogous to Kleene’s original star operation [22], which was also 
binary. This helps us avoid loops of the form e“), which can be skipped when b 
does not hold. The expression eP es corresponds to el?) - e2 in GKAT. 

Example 3.2. Using the same notational shorthand as in Example 2.1, the block 
of code in Figure 1(ii) can be cast as the skip-free GKAT expression 


(n := 1): ((fizzbuzz +3)nasin (fizz +3)n (buzz +5\n n))) -n++) S 100) (done!) 


Note how we use a skip-free loop of the form e1® ez instead of the looping 
construct el?) before concatenating with e2, as was done for GKAT. 


3.1 Skip-free Semantics 


There are three natural ways to interpret skip-free GKAT expressions: as au- 
tomata, as behaviours, and as languages.® After a short note on Boolean algebra, 
we shall begin with the automaton interpretation, also known as the small-step 
semantics, from which the other two can be derived. 


Boolean algebra. To properly present our automata, we need to introduce one 
more notion. Boolean expressions BExp are a syntax for elements of a Boolean 
algebra, an algebraic structure satisfying the equations in Fig. 3. When a Boolean 
algebra is freely generated from a finite set of basic tests (T in the case of 
BExp), it has a finite set At of nonzero minimal elements called atoms. Atoms 
are in one-to-one correspondence with sets of tests, and the Boolean algebra is 
isomorphic to P(At), the sets of subsets of At, equipped with V = U, A = Nn, 
and (—) = At \ (—). In the context of programming, one can think of an atom 
as a complete description of the machine state, saying which tests are true and 
which are false. We will denote atoms by the Greek letters œa and 8, sometimes 
with indices. Given a Boolean expression b € BExp and an atom a € At we say 
that a entails b, written a < b, whenever @ V b = 1, or equivalently a V b = b. 


6 We will connect these to the relational semantics from Definition 2.2 in Section 7. 
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e1 SP, e a<b ez SIP, e' akb ey 2P, e' 
ply e1 ty e2 2P, e' e1 +o e2 2P, e' erez 2P, e'ez 
nd ey 2, e a<b ey 2P, y a<b ez SIP, e' agb 
e1e2 ole, ez ees ole, e "(ey Oe 2) ees ale, eez eP ez alp e 


Fig. 4. The small-step semantics of skip-free GKAT expressions. 


Automata. Throughout the paper, we use the notation e+ S where S is a set 
and e is a symbol to denote the disjoint union (coproduct) of {e} and S. 

The small-step semantics of a skip-free GKAT expression uses a special type 
of deterministic automaton. A skip-free automaton is a pair (X,h), where X is 
a set of states and h: X > (L+¥ x (Vv + X))* is a transition structure. At 
every x € X and for any a € At, one of three things can happen: 

1. h(x)(a) = (p,y), which we write as x 2P, y, means the state x under a 

makes a transition to a new state y, after performing the action p; 

2. h(x)(a) = (p, v), which we write x °'?, VY, means the state x under a 

successfully terminates with action p; 

3. h(x)(a) = L, which we write x | a, means the state x under a terminates 
with failure. Often we will leave these outputs implicit. 


Definition 3.3 (Automaton of expressions). We equip the set GExp of 
all skip-free GKAT expressions with an automaton structure (GExp” , 8) given in 
Fig. 4, representing step-by-step execution. Given e € GExp , we denote the set 
of states reachable from e by (e) and call this the small-step semantics of e. 


The small-step semantics of skip-free GKAT expressions is inspired by Brzo- 
zowski’s derivatives [7], which provide an automata-theoretic description of the 
step-by-step execution of a regular expression. Our first lemma tells us that, like 
regular expressions, skip-free GKAT expressions correspond to finite automata. 


Lemma 3.4. For any e € GExp’, (e) has finitely many states. 


Example 3.5. The automaton that arises from the program fizzbuzz2 is below, 
with a = n < 100, b = 3|n, and c = 5|n. The expression e is the same as in 
Example 3.2, e, is the same as e but without the action n := 0 in front, and 
e2 = n++-e,. We also adopt the convention of writing x op x’ where b € BExp 
to represent all transitions x 2P, x’ where a < b. 


V abe | fizzbuzz, abē | fizz, 


ā | done! abc | buzz, abe | n 
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The automaton interpretation of a skip-free GKAT expression (its small-step 
semantics) provides an intuitive visual depiction of the details of its execution. 
This is a useful view on the operational semantics of expressions, but sometimes 
one might want to have a more precise description of the global behaviour of the 
program. The remaining two interpretations of skip-free GKAT expressions aim 
to capture two denotational semantics of expressions: one finer, bisimilarity, that 
makes a distinction on the branching created by how its states respond to atomic 
tests, which actions can be performed, and when successful termination and 
crashes occur; another coarser, language semantics, that assigns a language of 
traces to each expression capturing all sequences of actions that lead to successful 
termination. The key difference between these two semantics is their ability to 
distinguish programs that crash early in the execution versus programs that 
crash later—this is evident in the axiomatizations of both semantics. We start by 
presenting the language semantics as this is the more traditional one associated 
with GKAT (and regular) expressions. 


Language semantics. Formally, a (skip-free) guarded trace is a nonempty 
string of the form a p1-+:Qnpn, where each a; € At and p; € X. Intuitively, 
each a; captures the state of program variables needed to execute program ac- 
tion p; and the execution of each p; except the last yields a new program state 
ai41. A skip-free guarded language is a set of guarded traces. 

Skip-free guarded languages should be thought of as sets of strings denoting 
successfully terminating computations. 


Definition 3.6 (Language acceptance). In a skip-free automaton (X, h) with 
a state x € X, the language accepted by x is the skip-free guarded language 


L(x, (X, h)) = {aıpı “++ AnPn | x clan U1 +++ y+ Ly On|Pn s} 


If (X,h) is clear from context, we will simply write L(x) instead of L(x, (X,h)). 
If L(x) = L(y), we write x ~c y and say that x and y are language equivalent. 


Each skip-free GKAT expression is a state in the automaton of expressions 
(Definition 3.3) and therefore accepts a language. The language accepted by a 
skip-free GKAT expression is the set of successful runs of the program it denotes. 
Analogously to GKAT, we can describe this language inductively. 


Lemma 3.7. Given an expression e € GExp , the language accepted by e in 
(GExp’ , 0), i.e., L(e) = L(e, (GExp” ,0)) can be characterized as follows: 


£L(0)=0 L(p)={ap|aeAt} Lie, +p e2) = bL(e1) U bL(e2) 
L(er- e2) = L(e1) : L(e2) L(eP e2) = |] (bL(e1))" - BL(e2) 


nen 


Here, we write bL = {apw € L | a < b} and Li - Lo = {wx : w E Ih, E Lo}, 
while L? = {e} (where e denotes the empty word) and L”+! = L. L”. 


Lemma 3.7 provides a way of computing the language of an expression e 
without having to generate the automaton for e. 
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Bisimulation semantics. Another, finer, notion of equivalence that we can 
associate with skip-free automata is bisimilarity. 


Definition 3.8. Given skip-free automata (X,h) and (Y,k), a bisimulation is 
a relation RC X xY such that for any x Ry, a€ At andpe X: 


1. x} a if and only ify La, 
2. x oP, vV if and only if y “5 Vv, and 


3. for any x! Ry, x 2, a! en 


We call x and y bisimilar if x R y for some bisimulation R and write x € y. 


In a fixed skip-free automaton (X,h), we define © C X x X to be the 
largest bisimulation, called bisimilarity. This is an equivalence relation and a 
bisimulation.” The bisimilarity equivalence class of a state is often called its 


behaviour. 


Example 3.9. In the automaton below, x; and £2 are bisimilar. This is witnessed 
by the bisimulation { (£1, v2), (£2, X2)}. 


We can also use bisimulations to witness language equivalence. 
Lemma 3.10. Let e1,e2 E€ GExp . If e1  e2, then L(e1) = Lea). 


The converse of Lemma 3.10 is not true. Consider, for example, the program 
pq that repeats the atomic action p € X indefinitely, never reaching q. Since 


Yq) = LJ Lp)" -0 =0 = £(0) 


nen 


we know that pd ~c 0. But pg and 0 are not bisimilar, since Fig. 4 tells us 
that pq 22, pq and 0 | a, which together refute Definition 3.8.1. 


3.2 Axioms 


Next, we give an inference system for bisimilarity and language equivalence con- 
sisting of equations and equational inference rules. The axioms of skip-free GKAT 
are given in Fig. 2. They include the equation (t), which says that early deadlock 
is the same as late deadlock. This is sound with respect to the language interpre- 
tation, meaning that (f) is true if x is replaced with a skip-free guarded language, 
but it is not sound with respect to the bisimulation semantics. For example, the 
expressions p- 0 and 0 are not bisimilar for any p € X. Interestingly, this is the 
only axiomatic difference between bisimilarity and language equivalence. 


7 This follows directly from seeing skip-free automata as a special type of coalgebra and 
the fact that the functor involved preserves weak pullbacks [36]. In fact, coalgebra 
has been an indispensable tool in the production of the current paper, guiding us to 
the correct definitions and simplifying many of the proofs. 
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Remark 3.11. The underlying logical structure of our inference systems is equa- 
tional logic [5], meaning that provable equivalence is an equivalence relation that 
is preserved by the algebraic operations. 


Given expressions €1,€2 E€ GExp , we write e1 =; eg and say that e; and 
e2 are =;-equivalent if the equation e; = ez can be derived from the axioms in 
Fig. 2 without the axiom marked (t). We write e; = e2 and say that e; and e2 
are =-equivalent if e1 = eg can be derived from the whole set of axioms in Fig. 2. 

The axioms in Fig. 2 are sound with respect to the respective semantics they 
axiomatize. The only axiom that is not sound w.r.t. bisimilarity is x -0 = 0, as 
this would relate automata with different behaviours (x may permit some action 
to be performed, and this is observable in the bisimulation). 


Theorem 3.12 (Soundness). For any e1,e2 E€ GExp , 


1. If e1 =; e2, then e1 2 eg. 
2. If ey = e2, then e1 ~¢ eg. 

We consider the next two results, which are jointly converse to Theorem 3.12, 
to be the main theorems of this paper. They state that the axioms in Fig. 2 are 
complete for bisimilarity and language equivalence respectively, i.e., they describe 
a complete set of program transformations for skip-free GKAT. 


Theorem 3.13 (Completeness I). Ife, © e2, then e1 =; e2. 
Theorem 3.14 (Completeness II). Ife ~c ez, then e1 = e2. 


We prove Theorem 3.13 in Section 5 by drawing a formal analogy between 
skip-free GKAT and a recent study of regular expressions in the context of process 
algebra [15]. We include a short overview of this recent work in the next section. 

We delay the proof of Theorem 3.14 to Section 6, which uses a separate 
technique based on the pruning method introduced in [39]. 


4 1-free Star Expressions 


Regular expressions were introduced by Kleene [22] as a syntax for the algebra 
of regular events. Milner offered an alternative interpretation of regular expres- 
sions [32], as what he called star behaviours. Based on work of Salomaa [37], 
Milner proposed a sound axiomatization of the algebra of star behaviours, but 
left completeness an open problem. After nearly 40 years of active research from 
the process algebra community, a solution was finally found by Grabmayer [14]. 

A few years before this result, Grabmayer and Fokkink proved that a suit- 
able restriction of Milner’s axioms gives a complete inference system for the 
behaviour interpretation of a fragment of regular expressions, called the one- 
free fragment [15]. In this section, we give a quick overview of Grabmayer and 
Fokkink’s one-free fragment [15], slightly adapted to use an alphabet that will 
be suitable to later use in one of the completeness proofs of skip-free GKAT. 
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rı Bp, p T2 LP, r’ rı 2r, y 
ap 2 ¥ rı + r2 OP, r rı + r2 OP, r Tiro OP, r'ro 
ry P s ry 2B r ri P y rg P, r 
rire 2P; r2 rı xro 2P, r'(rı * r2) ry * r2 2P, ri * Te ritro 22 x 


Fig. 5. The small-step semantics of one-free star expressions. 


Syntax. In the process algebra literature [32,15,14], regular expressions gen- 
erated by a fixed alphabet A are called star expressions, and denote labelled 
transition systems (LTSs) with labels drawn from A. As was mentioned in Sec- 
tion 3, skip-free automata can be seen as certain LTSs where the labels are 
atomic test/atomic action pairs. In Section 5, we encode skip-free GKAT expres- 
sions as one-free regular expressions and skip-free automata as LTSs with labels 
drawn from At: X. We instantiate the construction from [15] of the set of star 
expressions generated by the label set At- X. 


Definition 4.1. The set StExp of one-free star expressions is given by 
StExp Ə 71,72 := 0 | ap E€ At - X | r1 + r2 | rır2 | r1 * re 


Semantics. The semantics of StExp is now an instance of the labelled transition 
systems that originally appeared in [15], with atomic test/atomic action pairs 
as labels and a (synthetic) output state v denoting successful termination. 

For the rest of this paper, we call a pair (S,t) a labelled transition system 
when S is a set of states and t: S > P(At- X x (v +S)) is a transition structure. 
We write x 2}, y if (ap,y) € t(x) and x 2, v if (ap, Vv) E€ t(x). 

The set StExp can be given the structure of a labelled transition system 
(StExp, 7), defined in Fig. 5. If r € StExp, we write (r) for the transition system 
obtained by restricting 7 to the one-free star expressions reachable from r and 
call (r) the small-step semantics of r. 

The bisimulation interpretation of one-free star expressions is subtler than 
the bisimulation interpretation of skip-free GKAT expressions. The issue is that 
labelled transition systems (LTSs) are nondeterministic in general: it is possible 
for an LTS to have both a x 2, y anda z 24, z transition for p Æ q or y F z. 
The appropriate notion of bisimilarity for LTSs can be given as follows. 


Definition 4.2. Given labelled transition systems (S,t) and (T, u), a bisimula- 
tion between them is a relation RC Sx T s.t. for any x Ry and ap € At- X, 


1. x ®, Vv if and only ify “Vv, 
2. if x 2, x', then there exist x’ Ry’ such that y 22, y', and 
3. ify 22 y', then there exist x’ Ry’ such that x 2”, 2’. 


As before, we denote the largest bisimulation by ©. We call x and y bisimilar 
and write x © y ifx Ry for some bisimulation R. 
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Union Sequencing Loops 
c=axt+an 
0 cxy=a(rxy)+y 
xr=x+0 
z(yz) = (zy)z z=az+y 
r+y=y+r —— 
(x +y)z = rz +yz Z=UxY 


et+(y+z2)=(e+y)+2 


Fig. 6. Axioms for equivalence for one-free star expressions. 


The following closure properties of bisimulations of LTSs are useful later. 
They also imply that bisimilarity is an equivalence relation. Like in the skip-free 
case, the bisimilarity equivalence class of a state is called its behaviour. 


Lemma 4.3. Let (S,t), (T,u), and (U,v) be labelled transition systems. Fur- 
thermore, let Ri, Ro C S x T and R3 C T x U be bisimulations. Then RE = 
{(y,x) | x Ri y}, Ri U Re and Rı o Rg are bisimulations. 


Axiomatization. We follow [15], where it was shown that the axiomatization 
found in Fig. 6 is complete with respect to bisimilarity for one-free star expres- 
sions. Given a pair r1, r2 € StExp, we write rı =, r2 and say that rı and rz are 
=,.-equivalent if the equation rı = r2 can be derived from the axioms in Fig. 6. 
The following result is crucial to the next section, where we prove that the 
axioms of =; are complete with respect to bisimilarity in skip-free GKAT. 


Theorem 4.4 ([15, Theorem. 7.1]). rı © r2 if and only if rı =, r2. 


5 Completeness for Skip-free Bisimulation GKAT 


This section is dedicated to the proof of our first completeness result, Theo- 
rem 3.13, which says that the axioms of Fig. 2 (excluding t) are complete with 
respect to bisimilarity in skip-free GKAT. Our proof strategy is a reduction of 
our completeness result to the completeness result for StExp (Theorem 4.4). 

The key objects of interest in the reduction are a pair of translations: one 
translation turns skip-free GKAT expressions into one-free star expressions and 
maintains bisimilarity, and the other translation turns (certain) one-free star 
expressions into skip-free GKAT expressions and maintains provable bisimilarity. 

We first discuss the translation between automata and labelled transition sys- 
tems, which preserves and reflects bisimilarity. We then introduce the syntactic 
translations and present the completeness proof. 


5.1 Transforming skip-free automata to labelled transition systems 


We can easily transform a skip-free automaton into an LTS by essentially turning 
ae transitions into 2, transitions. This can be formalized, as follows. 


Definition 5.1. Given a set X, we define grphy : (L+ X x (v +X))** > 
P(At- 3) x(V +X)) to be grphy (0) = {(ap, x) | O(a) = (p, x)}. Given a skip-free 
automaton (X,h), we define grph,(X,h) = (X, grphy o h) 
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The function grphy is injective: as its name suggests, grphy (0) is essentially 
the graph of 6 when viewed as a partial function from At to X x (v + X). This 
implies that the transformation grph, of skip-free automata into LTSs preserves 
and reflects bisimilarity. 


Lemma 5.2. Let x,y E€ X, and (X,h) be a skip-free automaton. Then x & y 
in (X,h) if and only if z © y in grph,.(X, h). 


Leading up to the proof of Theorem 3.13, we also need to undo the effect of 
grph, on skip-free automata with a transformation that takes every LTS of the 
form grph,(X,h) to its underlying skip-free automaton (X, h). 

The LTSs that can be written in the form grph,(X,h) for some skip-free 
automaton (X, h) can be described as follows. Call a set U € P(At- X x (VW +. X)) 
graph-like if whenever (ap,x) € U and (aq,y) E U, then p = q and z = y. An 
LTS (S,t) is deterministic if t(s) is graph-like for every s € S. 


Lemma 5.3. An LTS (S,t) is deterministic if and only if (S,t) = grph, (X, h) 
for some skip-free automaton (X,h). 


Remark 5.4. As mentioned in Footnote 7, there is a coalgebraic outlook in many 
of the technical details in the present paper. For the interested reader, grph and 
func are actually natural transformations between the functors whose coalgebras 
correspond to skip-free automata and labelled transitions, and are furthermore 
inverse to one another. This implies that grph, and func, witness an isomorphism 
between the categories of skip-free automata and deterministic LTSs. 


5.2 Translating Syntax 


We can mimic the transformation of skip-free automata into deterministic la- 
belled transition systems and vice-versa by a pair of syntactic translations going 
back and forth between skip-free GKAT expressions and certain one-free star 
expressions. Similar to how only some labelled transition systems can be turned 
into skip-free automata, only some one-free star expressions have corresponding 
skip-free GKAT expressions—the deterministic ones. 

The definition of deterministic expressions requires the following notation: 
given a test b € BExp, we define b- r inductively on r € StExp as follows: 


for any ap € At- X and r1,1r2 € StExp. 


Definition 5.5. The set of deterministic one-free star expressions is the small- 
est subset Det C StExp such that 0 € Det and ap € Det for anya € At and p € X, 
and for any r1,ro € Det, and b € BExp, b-r1+b-r2,r1r2, and (b-r1)x(b-ro2) € Det. 
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From GExp to Det. We can now present the translations of skip-free expres- 
sions to deterministic one-free star expressions. 


Definition 5.6. We define the translation function gtr : GExp — Det by 


gtr(0) =0 gtr(p) = 5 ap gtr(e1 + e2) = b - gtr(e1) +b - gtr(e2) 
acAt 


gtr(e1 - e2) = gtr(e1) gtr(e2) gtr(eP e2) = (b - e1) * (b - e2) 
for any b € BExp, p € X, e1,e2 E€ GExp. 


Remark 5.7. In Definition 5.6, we make use of a generalized sum } peat Tech- 
nically, this requires we fix an enumeration of At ahead of time, say At = 
{o1,...,Qn}, at which point we can define J geata = Tar H't + Tan. Of 
course, + is commutative and associative up to =,, so the actual ordering of 
this sum does not matter as far as equivalence is concerned. 


The most prescient feature of this translation is that it respects bisimilarity. 


Lemma 5.8. The graph of the translation function gtr is a bisimulation of la- 
belled transition systems between grph,(GExp ,0) and (StExp,7). Consequently, 
if e1 & e2 in grph,(GExp , 8), then gtr(e1) © gtr(e2) in (StExp, T). 


From Det to GExp . We would now like to define a back translation function 
rtg : Det + GExp by induction on its argument. Looking at Definition 5.5, one 
might be tempted to write rtg(b- ry +b- r2) = rtg(r1) + rtg(r2), but the fact 
of the matter is that it is possible for there to be distinct b,c € BExp such that 
berry tberg =e ry +E: T2, even when b and c have different atoms. 


Definition 5.9. Say that r1, r2 € StExp are separated by b € BExp if rı = b-rı 
and rg = b- r2. If such a b exists we say that rı and rg are separated. 


Another way to define Det is therefore to say that Det is the smallest subset 
of StExp containing 0 and At- X that is closed under sequential composition and 
closed under unions and stars of separated one-free star expressions. 

Suppose rı and r are separated by both b and c. Then one can prove that 
(bVc)ry =x bry ter, = rı and (bV c)r2 = (b A @)r2 = b(@r2) = r2, 80 71 
and r2 are separated by bV c as well. Since there are only finitely many Boolean 
expressions up to equivalence, there is a maximal (weakest) test b(r1, r2) € BExp 
such that rı and r2 are separated by b(r1, r2). 


Definition 5.10. The back translation rtg : Det — GExp” is defined by 
rtg(0)=0 = rtg(ap)=pt+a90  rtg(rı +r2) = rtg(r1) +ə(r,r2) rt8(r2) 
rtg(rirg) = rtg(r1) - rtg(re) rtg(rı x r2) = rtg(r1) 02) rtg(r2) 


for any rı,r2 € StExp. In the union and star cases, we may use that rı and ro 
are separated (by definition of Det), so that b(r1,r2) is well-defined. 
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The most prescient property of rtg is that it preserves provable equivalence. 
Lemma 5.11. Let rj,r2 € Det. If rı =, r2, then rtg(r1) =; rtg(r2). 


The last fact needed in the proof of completeness is that, up to provable 
equivalence, every skip-free GKAT expression is equivalent to its back-translation. 


Lemma 5.12. For any e € GExp’, e =; rtg(gtr(e)). 


We are now ready to prove Theorem 3.13, that provable bisimilarity is com- 
plete with respect to behavioural equivalence in skip-free GKAT. 


Theorem 3.13 (Completeness I). If e} © e2, then e1 =; e2. 


Proof. Let e,,e2 € GExp be a bisimilar pair of skip-free GKAT expressions. 
By Lemma 5.2, e} and eg are bisimilar in grph,(GExp ,0). By Lemmas 4.3 
and 5.8, the translation gtr : grph,(GExp” ,0) —> (StExp,7) preserves bisimilar- 
ity, so gtr(e,) and gtr(e2) are bisimilar in (StExp,7) as well. By Theorem 4.4, 
gtr(e1) =. gtr(e2). Therefore, by Lemma 5.11, rtg(gtr(e1)) =; rtg(gtr(e2)). Fi- 
nally, by Lemma 5.12, we have e1 =; rtg(gtr(e1)) =}; rtg(gtr(e2)) =; e2. 


6 Completeness for Skip-free GKAT 


The previous section establishes that =+-equivalence coincides with bisimilarity 
for skip-free GKAT expressions by reducing the completeness problem of skip- 
free GKAT up to bisimilarity to a solved completeness problem, namely that of 
one-free star expressions up to bisimilarity. In this section we prove a complete- 
ness result for skip-free GKAT up to language equivalence. We show this can be 
achieved by reducing it to the completeness problem of skip-free GKAT up to 
bisimilarity, which we just solved in the previous section. 

Despite bisimilarity being a less traditional equivalence in the context of 
Kleene algebra, this reduction simplifies the completeness proof greatly, and 
justifies the study of bisimilarity in the pursuit of completeness for GKAT. 

The axiom 2-0 = 0 (which is the only difference between skip-free GKAT up 
to language equivalence and skip-free GKAT up to bisimilarity) indicates that the 
only semantic difference between bisimilarity and language equivalence in skip- 
free GKAT is early termination. This motivates our reduction to skip-free GKAT 
up to bisimilarity below, which involves reducing each skip-free expression to an 
expression representing only the successfully terminating branches of execution. 

Now let us turn to the formal proof of Theorem 3.14, which says that if 
e, f € GExp” are such that L(e) = L(f), then e = f. Ina nutshell, our strategy is 
to produce two terms |e], |f] € GExp” such that e = |e], f = |f] and |e] © |f] 
in (GExp” ,0). The latter property tells us that |e] =; |f] by Theorem 3.13, 
which allows us to conclude e = f. The expression |e| can be thought of as the 
early termination version of e, obtained by pruning the branches of its execution 
that cannot end in successful termination. 

To properly define the transformation |— | on expressions, we need the notion 
of a dead state in a skip-free automaton, analogous to a similar notion from [42]. 
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Definition 6.1. Let (X,h) be a skip-free automaton. The set D(X,h) is the 
largest subset of X such for all x € D(X,h) anda € At, either h(x)(a) = L or 
h(a)(a) € X x D(X,h). When x E€ D(X,h), x is dead; otherwise, it is live. 


In the sequel, we say e € GExp” is dead when e is a dead state in (GExp , ð), 
i.e., when e € D(GExp ,0). Whether e is dead can be determined by a simple 
depth-first search, since e can reach only finitely many expressions by 0. The 
axioms of skip-free GKAT can also tell when a skip-free expression is dead. 


Lemma 6.2. Let e € GExp. Ife is dead, then e = 0. 


We are now ready to define |—], the transformation on expressions promised 
above. The intuition here is to prune the dead subterms of e by recursive descent; 
whenever we find a part that will inevitably lead to an expression that is never 
going to lead to acceptance, we set it to 0. 


Definition 6.3. Let e c GExp anda € BExp. In the sequel we use ae as a 
shorthand for e +a 0. We furthermore define |e| inductively, as follows 


[0] =0 [p] =p [e1 + e2] = Lei] +0 Lea] 
o 0 e> is dead (b) = 0 bes is dead 
ler-e2] = ie) -|e2| otherwise le e2] = [e1]© e2] otherwise 


The transformation defined above yields a term that is =-equivalent to e, 
provided that we include the early termination axiom e 0 = 0. The proof is a 
simple induction on e, using Lemma 6.2. 


Lemma 6.4. For any e € GExp , e= |e]. 


It remains to show that if L(e) = L(f), then |e| and |f] are bisimilar. To 
this end, we need to relate the language semantics of e and f to their behaviour. 
As a first step, we note that behaviour that never leads to acceptance can be 
pruned from a skip-free automaton by removing transitions into dead states. 


Definition 6.5. Let (X,h) be a skip-free automaton. Define |h] : X + GX by 
L h(x)(a) = (p, x’), a’ is dead 
h(x)(a) otherwise 


Moreover, language equivalence of two states in a skip-free automaton implies 
bisimilarity of those states, but only in the pruned version of that skip-free 
automaton. The proof works by showing that the relation on X that connects 
states with the same language is, in fact, a bisimulation in (X, |h]). 


Lemma 6.6. Let (X,h) be a skip-free automaton and x,y E€ X. We have 


L(x, (X,h)) = L(y, (X,h)) = z2 y in (X, |h]) 
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The final intermediate property relates the behaviour of to states in the 
pruned skip-free automaton of expressions to the syntactic skip-free automaton. 


Lemma 6.7. The graph {(e,|e]) | e € GExp } of |—] is a bisimulation of 
skip-free automata between (GExp’ ,|0|) and (GExp” , 0). 


We now have all the ingredients necessary to prove Theorem 3.14. 
Theorem 3.14 (Completeness IT). Jf e1 ~c e2, then e1 = e2. 


Proof. If e1 ~c e2, then by definition £(e1) = L(e2). By Lemma 6.6, e1 © ez in 
(GExp” , |0]), which by Lemma 6.7 implies that |e1| & |e2| in (GExp’ ,0). From 
Theorem 3.13 we know that |e:| =; |e2|, and therefore e1 = e2 by Lemma 6.4. 


7 Relation to GKAT 


So far we have seen the technical development of skip-free GKAT without much 
reference to the original development of GKAT as it was presented in [42] and [39]. 
In this section, we make the case that the semantics of skip-free GKAT is merely 
a simplified version of the semantics of GKAT, and that the two agree on which 
expressions are equivalent after embedding skip-free GKAT into GKAT. More 
precisely, we identify the bisimulation and language semantics of skip-free GKAT 
given in Section 3 with instances of the existing bisimulation [39] and lan- 
guage [42] semantics of GKAT proper. The main takeaway is that two skip-free 
GKAT expressions are equivalent in our semantics precisely when they are equiv- 
alent when interpreted as proper GKAT expressions in the existing semantics. 


7.1 Bisimulation semantics 


To connect the bisimulation semantics of skip-free GKAT to GKAT at large, we 
start by recalling the latter. To do this, we need to define GKAT automata. 


Definition 7.1. A (GKAT) automaton is a pair (X,d) such that X is a set and 
d: X >(L+V+Z xX)“ is a function called the transition function. We 


write x 2P, y to denote d(x)(a) = (p,y), x => a to denote d(x)(a) = Vv, and 
x la if d(x)(a) is undefined. 


Automata can be equipped with their own notion of bisimulation.® 


Definition 7.2. Given automata (X,h) and (Y, k), a bisimulation between them 
is a relation RC X x Y such that if x Ry, a€ At andpe »,: 


1. if h(w)(a) = L, then k(y)(a@) = L; and 
2. if h(a)(a) = Vv, then k(y)(a) = Vv; and 
3. if h(a)(a) = (p,2’), then k(y)(a@) = (p, y’) such that a’ Ry’. 


8 As in previous sections, automata can be studied as coalgebras for a given functor 
and the notions below are instances of general abstract notions [17,36]. 
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a<b a<b asa a<b esa ax<b e1 2, e a<b ez SIP, e' 


b>a er. tpe2>a e1 +e >a e1 +p ez LIP, e' er +5 e2 IP, e 
e>a e a e>a fale e 2P, e 
p 22,1 e1 :e2 >a e -ez AIP, e e1 -ez SIP, e' 3 ez 
a<b e 2P, e a<b 


8) a lee) Oa 


Fig. 7. The transition function 5 : GExp > (L + v + X x GExp)^ defined inductively. 
Here, €13€2 is e2 when e = 1 and e1 -e2 otherwise, b € BExp, p € X, and e, e’, e; € GExp. 


We call x and y bisimilar and write x © y ifx Ry for some bisimulation R. 


Remark 7.3. The properties listed above are implications, but it is not hard to 
show that if all three properties hold for R, then so do all of their symmetric 
counterparts. For instance, if k(y)(a@) = (p,y’), then certainly h(x)(a) must be 
of the form (q, x’), which then implies that q = p while a’ R y’. 


Two GKAT expressions are bisimilar when they are bisimilar as states in the 
syntactic automaton [39], (GExp, 6), summarised in Fig. 7. 


Remark 7.4. The definition of 6 given above diverges slightly from the definition 
n [39]. Fortunately, this does not make a difference in terms of the bisimula- 
tion semantics: two expressions are bisimilar in (GExp,0) if and only if they 
are bisimilar in the original semantics. The full version [40] contains a detailed 
account. 


There is a fairly easy way to convert a skip-free automaton into a GKAT 
automaton: simply reroute all accepting transitions into a new state T, that 
accepts immediately, and leave the other transitions the same. 


Definition 7.5. Given a skip-free automaton (X,d), we define the automaton 
embed(X,d) = (X + T,d), where d is defined by 
v g= T 
d(x)(a) = § (p,T) — d(x)(a) = (p, v) 
d(x)(a) otherwise 


We can show that two states are bisimilar in a skip-free automaton if and 
only if these same states are bisimilar in the corresponding GKAT automaton. 


Lemma 7.6. Let (X,d) be a skip-free automaton, and let x,y € X. 
zey in(X,d) 4 «Hy in embed(X, d) 


The syntactic skip-free automaton (GExp” ,0) can of course be converted to 
a GKAT automaton in this way. It turns out that there is a very natural way of 
correlating this automaton to the syntactic GKAT automaton (GExp, ô). 
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Lemma 7.7. The relation {(e,e) : e e GExp } U{(7T,1)} is a bisimulation 
between embed(GExp” ,0) and (GExp, ô). 


We now have everything to relate the bisimulation semantics of skip-free 
GKAT expressions to the bisimulation semantics of GKAT expressions at large. 


Lemma 7.8. Let e, f € GExp . The following holds: 
e © f in (GExp , 0) —> e £ f in (GExp, ô) 


Proof. We derive using Lemmas 7.6 and 7.7, as follows: since the graph of embed 
is a bisimulation, e & f in (GExp’ ,0Q) iff e © f in embed(GExp , ð) if and only 
if e © f in (GExp, ô). In the last step, we use the fact that if R is a bisimulation 
(of automata) between (X, h) and (Y, k), and S is a bisimulation between (Y, k) 
and (Z, £), then Ro S is a bisimulation between (X, h) and (Z, £). 


7.2 Language semantics 


We now recall the language semantics of GKAT, which is defined in terms of 
guarded strings [28], i.e., words in the set At- (X - At)*, where atoms and actions 
alternate. In GKAT, successful termination occurs with a trailing associated test, 
representing the state of the machine at termination. In an execution of the 
sequential composition of two programs e- f, the test trailing the execution of e 
needs to match up with an input test compatible with f, otherwise the program 
crashes at the end of executing e. The following operations on languages of 
guarded strings record this behaviour by matching the ends of traces on the left 
with the beginnings of traces on the right. 


Definition 7.9. For L, kK C At- (X -At)*, define Lo K = {waz : wa € L,ax € 
K} and L™ = Unen L™, where L™ is defined inductively by setting L® = At 
and D+) = LoL, 


The language semantics of a GKAT expression is now defined in terms of the 
composition operators above, as follows. 


Definition 7.10. We define £ : GExp > P (At: (5'-At)*) inductively, as follows: 
Lb) ={aeAt|a<d} Lp) ={apBla,BeAt} Lle- f) =Lle)oL(f) 
Lle +, f) = Eb) o Lle) UEO o L) Ele) = (L(b) o L(e))™ o LO) 


This semantics is connected to the relational semantics from Definition 2.2: 


Theorem 7.11 ([42]). For e,f € GExp, we have L(e) = L(f) if and only if 
lelo = [flo for all relational interpretations o 


Moreover, since skip-free GKAT expressions are also GKAT expressions, this 
means that we now have two language interpretations of the former, given by £ 
and L. Fortunately, one can easily be expressed in terms of the other. 
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Guarded Union Sequencing Loops 
L=L+2 x(yz) = (ay)z ag) 4+,1= gw) 
LTY SYTT Or = 0 (£ +a 1)® = (ax) 
x +b (y +e z) = (x +o y) Hove Z 20 Yo 
atyy=betoy irag z=aztoy E(x) =0 
(£ +b y)z = £2 + yz rl=g za ay 


Fig. 8. Axioms for language semantics GKAT (without the Boolean algebra axioms 
for tests). The function E : GExp > BExp is defined below. If the axiom marked (f) is 
omitted, the above potentially axiomatizes bisimilarity. 


Lemma 7.12. For e € GExp”, it holds that L(e) = L(e) - At. 


As an easy consequence of the above, we find that the two semantics must 
identify the same skip-free GKAT-expressions. 


Lemma 7.13. For e, f € GExp”, we have L(e) = L(f) iff Ele) = L(f). 


By Theorem 3.14, these properties imply that = also axiomatizes relational 
equivalence of skip-free GK AT-expressions, as a result. 


Corollary 7.14. Let e, f € GExp”, we have e = f if and only if [elo = [fla 
for all relational interpretations o. 


7.3 Equivalences 


Finally, we relate equivalences as proved for skip-free GKAT expressions to those 
provable for GKAT expressions, showing that proofs of equivalence for skip-free 
GKAT expressions can be replayed in the larger calculus, without (UA). 

The axioms of GKAT as presented in [42,39] are provided in Figure 8. We 
write e =; f when e = f is derivable from the axioms in Figure 8 with the 
exception of ({), and e = f when e = f is derivable from the full set. 

The last axiom of GKAT is not really a single axiom, but rather an axiom 
scheme, parameterized by the function Æ : GExp —> BExp defined as follows: 


EQ)=b E(p)=0  E(e+s f) = (WA E(e)) VGA ECF) 
Ele: f)=Bl)NE(f) Ble) =3 


The function Æ models the analogue of Salomaa’s empty word property [37]: we 
say e is guarded when E(b) is equivalent to 0 by to the laws of Boolean algebra. 
Notice that as GKAT expressions, skip-free GKAT expressions are always guarded. 
Since skip-free GKAT expressions are also GKAT expressions, we have four 
notions of equivalence for GKAT expressions: as skip-free expressions or GKAT 
expressions in general, either with or without (t). These are related as follows. 


Theorem 7.15. Let e, f € GExp . Then (1) e =; f if and only if e =; f, and 
(2ex f if and only ife= f. 
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Proof. For the forward direction of (1), we note that if e ~, f, then e © f in 
(GExp, ô) by Theorem 3.12. By Lemma 7.8, e & f in (GExp’,6) and therefore 
e =; f by Theorem 3.13. Conversely, note that any proof of e = f by the axioms 
of Figure 2 can be replayed using the rules from Figure 8. In particular, the 
guardedness condition required for the last skip-free GKAT axiom using the last 
GKAT axiom is always true, because E(g) +; 0 for any g € GExp . 

The proof of the second claim is similar, but uses Theorem 3.13 instead. 


8 Related Work 


This paper fits into a larger research program focused on understanding the 
logical and algebraic content of programming. Kleene’s paper introducing the 
algebra of regular languages [22] was a foundational contribution to this re- 
search program, containing an algebraic account of mechanical programming 
and some of its sound equational laws. The paper also contained an interesting 
completeness problem: give a complete description of the equations satisfied by 
the algebra of regular languages. Salomaa was the first to provide a sound and 
complete axiomatization of language equivalence for regular expressions [37]. 

The axiomatization in op. cit. included an inference rule with a side condition 
that prevented it from being algebraic in the sense that the validity of an equa- 
tion is not preserved when substituting letters for arbitrary regular expressions. 
Nevertheless, this inspired axiomatizations of several variations and extensions 
of Kleene algebra [46,42,41], as well as Milner’s axiomatization of the algebra of 
star behaviours [32]. The side condition introduced by Salomaa is often called 
the empty word property, an early version of a concept from process theory called 
guardedness® that is also fundamental to the theory of iteration [6]. 

Our axiomatization of skip-free GKAT is algebraic due to the lack of a guard- 
edness side-condition (it is an equational Horn theory [31]). This is particularly 
desirable because it allows for an abundance of other models of the axioms. 
Kozen proposed an algebraic axiomatization of Kleene algebra that is sound 
and complete for language equivalence [24], which has become the basis for a 
number of axiomatizations of other Kleene algebra variants [13,19,20,47] includ- 
ing Kleene algebra with tests [25]. KAT also has a plethora of relational models, 
which are desirable for reasons we hinted at in Section 2. 

GKAT is a fragment of KAT that was first identified in [29]. It was later 
given a sound and complete axiomatization in [42], although the axiomatization 
is neither algebraic nor finite (it includes (UA), an axiom scheme that stands for 
infinitely many axioms). It was later shown that dropping x - 0 = 0 (called (S3) 
in [42]) from this axiomatization gives a sound and complete axiomatization of 
bisimilarity [39]. The inspiration for our pruning technique is also in [39], where 
a reduction of the language equivalence case to the bisimilarity case is discussed. 


° This is a different use of the word “guarded” than in “guarded Kleene algebra with 
tests”. In the context of process theory, a recursive specification is guarded if every 
of its function calls occurs within the scope of an operation. 
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Despite the existence of an algebraic axiomatization of language equivalence 
in KAT, GKAT has resisted algebraic axiomatization so far. Skip-free GKAT hap- 
pens to be a fragment of GKAT in which every expression is guarded, thus 
eliminating the need for the side condition in Fig. 8 and allowing for an alge- 
braic axiomatization. An inequational axiomatization resembling that of KAT 
might be gleaned from the recent preprint [38], but we have not investigated this 
carefully. The GKAT axioms for bisimilarity of ground terms can also likely be 
obtained from the small-step semantics of GKAT using [1,2,3], but unfortunately 
this does not appear to help with the larger completeness problem. 

The idea of reducing one completeness problem in Kleene algebra to another 
is common in Kleene algebra; for instance, it is behind the completeness proof of 
KAT [28]. Cohen also reduced weak Kleene algebra as an axiomatization of star 
expressions up to simulation to monodic trees [10], whose completeness was con- 
jectured by Takai and Furusawa [45]. Grabmayer’s solution to the completeness 
problem of regular expressions modulo bisimulation [14] can also be seen as a 
reduction to the one-free case [15], since his crystallization procedure produces 
an automaton that can be solved using the technique found in op. cit. Other in- 
stances of reductions include [9,4,11,47,19,21,30,34,26]. Recent work has started 
to study reductions and their compositionality properties [11,20,33]. 


9 Discussion 


We continue the study of efficient fragments of Kleene Algebra with Tests (KAT) 
initiated in [42], where the authors introduced Guarded Kleene Algebra with 
Tests (GKAT) and provided an efficient decision procedure for equivalence. They 
also proposed a candidate axiomatization, but left open two questions. 


— The first question concerned the existence of an algebraic axiomatization, 
which is an axiomatization that is closed under substitution—i.e., where one 
can prove properties about a certain program p and then use p as a variable 
in the context of a larger program, being able to substitute as needed. This 
is essential to enable compositional analysis. 

— The second question left open in [42] was whether an axiomatization that 
did not require an axiom scheme was possible. Having a completeness proof 
that does not require an axiom scheme to reason about mutually dependent 
loops is again essential for scalability: we should be able to axiomatize single 
loops and generalize this behaviour to multiple, potentially, nested loops. 


In this paper, we identified a large fragment of GKAT, which we call skip-free 
GKAT (GKAT_ ), that can be axiomatized algebraically without relying on an ax- 
iom scheme. We show how the axiomatization works well for two types of equiva- 
lence: bisimilarity and language equivalence, by proving completeness results for 
both semantics. Having the two semantics is interesting from a verification point 
of view as it gives access to different levels of precision when analyzing program 
behaviour, but also enables a layered approach to the completeness proofs. 
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We provide a reduction of the completeness proof for language semantics to 
the one for bisimilarity. Moreover, the latter is connected to a recently solved [14] 
problem proposed by Milner. This approach enabled two things: it breaks down 
the completeness proofs and reuses some of the techniques while also highlighting 
the exact difference between the two equivalences (captured by the axiom e-0 = 0 
which does not hold for bisimilarity). We also showed that proofs of equivalence 
in skip-free GKAT transfer without any loss to proofs of equivalence in GKAT. 

There are several directions for future work. The bridge between process 
algebra and Kleene algebra has not been exploited to its full potential. The 
fact that we could reuse results by Grabmayer and Fokkink [14,15] was a major 
step towards completeness. An independent proof would have been much more 
complex and very likely required the development of technical tools resembling 
those in [14,15]. We hope the results in this paper can be taken further and more 
results can be exchanged between the two communities to solve open problems. 

The completeness problem for full GKAT remains open, but our completeness 
results for skip-free GKAT are encouraging. We believe they show a path towards 
studying whether an algebraic axiomatization can be devised or a negative re- 
sult can be proved. A first step in exploring a completeness result would be 
to try extending Grabmayer’s completeness result [14] to a setting with output 
variables—this is a non-trivial exploration, but we are hopeful will yield new 
tools for completeness. As mentioned in the introduction, NetKAT [4] (and its 
probabilistic variants [12,43]) have been one of the most successful extensions of 
KAT. We believe the step from skip-free GKAT to a skip-free guarded version of 
NetKAT is also a worthwhile exploration. Following [16], we hope to be able to 
explore these extensions in a modular and parametric way. 
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Abstract. Distributed algorithms solving agreement problems like con- 
sensus or state machine replication are essential components of modern 
fault-tolerant distributed services. They are also notoriously hard to un- 
derstand and reason about. Their complexity stems from the different as- 
sumptions on the environment they operate with, i.e., process or network 
link failures, Byzantine failures etc. In this paper, we propose a novel ab- 
stract representation of the dynamics of such protocols which focuses on 
quorums of responses (votes) to a request (proposal) that form during a 
run of the protocol. We show that focusing on such quorums, a run of 
a protocol can be viewed as working over a tree structure where differ- 
ent branches represent different possible outcomes of the protocol, the 
goal being to stabilize on the choice of a fixed branch. This abstraction 
resembles the description of recent protocols used in Blockchain infras- 
tructures, e.g., the protocol supporting Bitcoin or Hotstuff. We show 
that this abstraction supports reasoning about the safety of various al- 
gorithms, e.g., Paxos, PBFT, Raft, and HotStuff, in a uniform way. In 
general, it provides a novel induction based argument for proving that 
such protocols are safe. 


1 Introduction 


Consensus or state-machine replication protocols are essential ingredients for 
maintaining strong consistency in modern fault-tolerant distributed systems. 
Such protocols must execute in the presence of concurrent and asynchronous 
message exchanges as well as benign (message loss, process crash) or Byzantine 
failures (message corruption). Developing practical implementations or reason- 
ing about their correctness is notoriously difficult. Standard examples include 
the classic Paxos [21] or PBFT [5] protocols, or the more recent HotStuff [37] 
protocol used in Blockchain infrastructures. 

In this paper, we propose a new abstraction for representing the executions 
of such protocols that can be used in particular, to reason about their safety, 
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i.e., ensuring Agreement (e.g., all correct processes decide on a single value) and 
Validity (e.g., the decided value has been proposed by some node participat- 
ing in the protocol). Usually, protocol executions are composed of a number of 
communication-closed rounds [11], and each round consists of several phases in 
which a process broadcasts a request and expects to collect responses from a 
quorum of processes before advancing to the next phase. The abstraction is de- 
fined as a sequential object called Quorum Tree (QTree) which maintains a tree 
structure where each node corresponds to a different round in an execution. The 
operations of QTree, to add or change the status of a node, model quorums of 
responses that have been received in certain phases of a round. 


For instance, a round in single-decree Paxos consists of two phases: a prepare 
phase where a pre-determined leader broadcasts a request for joining that round 
and expects a quorum of responses from the other processes before advancing to 
a vote phase where it broadcasts a value to agree upon and expects a quorum 
of responses (votes) in order to declare that value as decided in that round. 
Rounds are initiated by their respective leaders and can run concurrently. The 
idea behind QTree is to represent a Paxos execution using a rooted tree where 
each node different from the root corresponds to a round where the leader has 
received a quorum of responses in the prepare phase. The parent-child relation 
models the data flow from one round to a later round: responses to join requests 
contain values voted for in previous rounds (if any) and one of them will be 
included by the leader in the vote phase request. The round in which that value 
was voted defines the parent. Then, each node has one out of three possible 
statuses: ADDED if the vote phase can still be successful (the leader can collect a 
quorum of votes) but this did not happen yet, GHOST if the vote phase can not 
be successful (e.g., a majority of processes advanced to the next round without 
voting), and COMMITTED if the leader has received a quorum of responses in the 
vote phase. This is a tree structure because before reaching a quorum in the vote 
phase of a round, other rounds can start and their respective leaders can send 
other vote requests (with possibly different values). The specific construction of 
requests and responses in Paxos ensures that all the COMMITTED nodes in this 
tree belong to a single branch, which entails the agreement property (this will 
become clearer when presenting the precise definition of QTree in Section 2). 


The QTree abstraction is applicable to a wide range of protocols beyond 
the single-decree Paxos sketched above. It applies to state-machine replication 
protocols like Raft [36] and HotStuff [37] where the tree structure represents 
logs of commands (inputted by clients) stored at different processes and orga- 
nized according to common prefixes (each node corresponds to a single com- 
mand) and multi-decree consensus protocols like multi-Paxos [21] and its vari- 
ants [16,26,23,18], or PBFT [5] where different consensus instances (for different 
indices in a sequence of commands) are modeled using different QTree instances. 


We show that all these protocols are refinements of QTree in the sense that 
their executions can be mapped to sequences of operations on a QTree state, 
which are about agreeing on a branch of the tree called the trunk. These oper- 
ations are defined as invocations of two methods add and commit for adding a 
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new leaf to the tree (during which some other nodes may turn to GHOST) and 
changing the status of a node from ADDED to COMMITTED, respectively. Any se- 
quence of invocations to these methods ensures that all the COMMITTED nodes lie 
on the same branch of the tree (the trunk). In relation to protocol executions, 
add and commit invocations that concern the same node correspond to receiving 
a quorum of responses in two specific phases of a round, which vary from one 
protocol to another. 

The mapping between protocol executions and QTree executions is defined as 
in proofs of linearizability for concurrent objects with fixed linearization points. 
Analogous to linearizability, where the goal is to show that an object method 
takes effect instantaneously at a point in time called linearization point, we 
show that it is possible to mark certain steps of a given protocol as linearization 
points of add or commit operations*, such that the sequence of add and commit 
invocations defined by the order between linearization points along a protocol 
execution is a correct QTree execution. We introduce a declarative character- 
ization of correct QTree executions that simplifies the proof of the latter (see 
Section 3). 

The QTree abstraction offers a novel view on the dynamics of classic consen- 
sus or state-machine replication protocols like Paxos, Raft, and PBFT, which 
relates to the description of recent Blockchain protocols like HotStuff and Bit- 
coin [27], i.e., agreeing on a branch in a tree. It provides a formal framework 
to reason uniformly about single-decree consensus protocols and state-machine 
replication protocols like Raft and HotStuff. For single-decree protocols (or com- 
positions thereof), the parent-child relation between QTree nodes corresponds 
to the data-flow between a quorum of responses to a leader and the request he 
sends in the next phase while for Raft and HotStuff, it corresponds to an order 
set by a leader between different commands. 

Our work relies on a hypothesis that correctness proofs based on establishing 
a refinement towards an operational specification such as QTree, which can be 
understood as a sequence of steps, are much more intuitive and “explainable” 
compared to classic proofs based on inductive invariants. An inductive invariant 
has to describe all intermediate states produced by all possible orders of receiving 
messages and a precise formalization is quite complex. As an indication, the 
Paxos invariant used in recent work [29] (see formulas (4) to (12) in Section 5.2) 
is a conjunction of eight quantified first-order formulas which are hard to reason 
about and not re-usable in the context of a different protocol. 

We believe that operational specifications are also helpful in taming com- 
plexity while designing new protocols or implementations theoreof, or in gaining 
confidence about their correctness without going through ad-hoc and brittle 
proof arguments. For instance, our proofs are very clear about the phases of a 
round in which quorums need to intersect, which provides flexibility and opti- 


4 These linearization points are fixed in the sense that they correspond to specific 
instructions in the code of the protocol, and they do not depend on the future of 
an execution. For an expert reader, this actually corresponds to a proof of strong 
linearizability [15]. 
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mization opportunities for deciding on quorum sizes in each phase. Depending on 
environment assumptions, quorum sizes can be optimized while preserving cor- 
rectness. Compared to previous operational specifications for reasoning about 
consensus protocols, e.g., [3,12], QTree is designed to be less abstract so that 
the refinement proof, establishing the relationship between a given protocol and 
QTree, is less complex (see Section 8 for details). 


2 Quorum Tree 


We describe the QTree sequential object which operates on a tree and has two 
methods add and commit for adding a new node and modifying an attribute 
of a node (committing a node), respectively. When used as an abstraction of 
consensus protocols, invocations of these two methods correspond to certain 
quorums that are reached during a round of the protocol. 


2.1 Overview 


QTree is a sequential rooted-tree, a possible state being depicted in Figure 1. 
The nodes with black dashed margins are not members of the tree and they are 
discussed later. Each node in the tree contains a round number, a value, and a 
status field set to ADDED, GHOST, or COMMITTED. The round number acts as an 
identifier of a node since there can not exist two nodes with the same round 
number. The Root node is part of the initial state and its status is COMMITTED. 
A QtTree state consists of a trunk, alive branches, and dead branches; a branch is 
a chain of nodes connected by the parent relation. Alive branches are extensible 
with new ADDED nodes but dead branches are not. The trunk is a particular 
branch of the tree that starts from the root. It contains all the COMMITTED nodes 
and it ends with a COMMITTED node. It may also contain ADDED or GHOST nodes. 
For example, in Figure 1, the trunk consists of Root and n3. All alive branches are 
connected to the last COMMITTED node of the trunk (alive branches can include 
ADDED or GHOST nodes). For instance, in Figure 1, the subtree rooted at ng 
contains a single alive branch whose leaf node is n;. Dead branches can contain 
only GHOST nodes. In Figure 1, the tree contains a single dead branch containing 
the node nı. 

Nodes can be added to the tree as leaves. The status of a newly added node is 
either ADDED or GHOST. The status ADDED may turn to GHOST or COMMITTED. The 
GHOST status is “final” meaning that it can never turn into COMMITTED afterwards. 
However, GHOST nodes can be part of alive branches, and they can help in growing 
the tree. 

QTree has two methods add and commit: 


— add generates a new leaf with a round number r value v and parent p iden- 
tified by the round number rp given as an input. Its status is set to ADDED 
or GHOST provided that some conditions hold. If the status of the new node 
is set as ADDED, then it either extends (has a path to the end of) an existing 
alive branch or creates a new alive branch from the trunk. The new node 
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may also “invalidate” some other nodes by changing their status from ADDED 
to GHOST. 

— commit extends the trunk by turning the status of a node from ADDED to 
COMMITTED. This extension of the trunk may prevent some branches to be 
extended in the future (some alive branches may become dead), i.e., future 
invocations of add that extend those branches will add only GHOST nodes. 


Each node models the evolution of a round in a consensus protocol and the value 
attribute represents the value proposed by the leader of that round. The round 
and value attributes of a node are immutable and cannot be changed later. We 
assume that round numbers are strictly positive except for Root whose round 
number is 0. 

QTree applies uniformly to a range of consensus or state-machine replication 
protocols. We start by describing a variation that applies to single-decree con- 
sensus protocols, where a number of processes aim to agree on a single value. 
Multi-decree consensus protocols that are used to solve state-machine replication 
can be simulated using a number of instances of QTree, one for each decree (the 
instances are independent one from another). Then, state-machine replication 
protocols like HotStuff that rely directly on a tree structure to order commands 
can be simulated by the QTree for single-decree consensus modulo a small change 
that we discuss later. 


2.2 Definition of the Single-Decree Version 


Algorithm 1 lists a description of QTree in pseudo-code. The following set of 
predicates are used as conditions inside methods: 


1. link(n) = n.parent € Nodes A n.parent.round < n.round 
2. newRound(n) = Vn’ € Nodes. n’.round Æ n.round 
3. maxCommitted(n) = n.status = COMMITTED ^ 
(Vn! € Nodes. n’.status = COMMITTED = > n’.round < n.round) 
4. extendsTrunk(n) = dn’ € Nodes. maxCommitted(n’) A 
(n extends n’ V n.round < n’.round) 
5. valid(n) = link(n) A newRound(n) A extendsTrunk(n) 
6. valueConstraint(n) = n.parent # Root => n.value = n.parent.value 


The add method (lines 5-17) generates a new node n with round, value, and 
parent set according to the method’s inputs. Then, it adds n to the tree by 
linking it to the selected parent if n satisfies the following validity conditions: 


— n’s parent belongs to the tree and its round number is smaller than r (pred- 
icate link at (1)), 

— the tree does not contain a node with round number r (predicate newRound 
at (2)), 

— if r is bigger than the round number of the last node of the trunk, then n 
must extend the trunk (predicate eztendsTrunk at (4)), 

— n’s value must be the same as its parent’s value unless the parent is the Root 
(predicate valueConstraint at (6)). 
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Algorithm 1: THE QTREE OBJECT 


1 Initialize: 


/* L denotes non-initialized values * / 


2 Root.round = 0; Root.status = COMMITTED; 
3 Root.value = L; Root.parent = Root; 
4 Nodes = {Root}; 
5 Method add (r, v, rp) ng 
6 Pre: r > 0 
7 n = new Node(round = r, status = L, 
value = v, parent = p : p.round = rp); 
8 if valid(n) A valueConstraint(n) aa a 
9 Nodes = Nodes U {n}; i value = vy ! t value = v3 ! 
10 n.status = ADDED; es re 
11 if In’ € Nodes. n'.round > n.round Fig.1: A state of QTree. 
Pp n.status = GHOST; We represent ADDED nodes 
13 forall n’ € Nodes. n'.round < n.round with green solid MARINS, 
td ifia conflicting wth n GHOST nodes with red double- 
ig a n’ status < GHOST; line margins, and COMMITTED 
E nodes with blue thick mar- 
16 | return OK gins. The nodes with black 
17 return FAIL dashed margins are not part 
is Method commit (r) of the state, they are ficti- 


tious nodes used to explain 


19 if 3 n € Nodes. n.round =r ^ . 
n.status = ADDED the method for adding new 

20 n.status + COMMITTED; nodes. 

21 return OK 


22 return FAIL 


The valid predicate at (5) is the conjunction of the first three constraints. 

For example, let us consider an invocation of add in a state of QTree that 
contains the non-dashed nodes in Figure 1. If the invocation generates n2, n4, or 
ne (receiving as input the corresponding attributes), then nz and ne do satisfy 
all these constraints and can be added to the tree. The node n4 fails the extend- 
sTrunk predicate because it is not extending the last node of the trunk (n3) and 
its round number is higher. 

If a node n satisfies the conditions above, the add method turns its status 
to either ADDED or GHOST. If there is another node in the tree with a higher 
round number, n’s status becomes GHOST. Otherwise, it becomes ADDED. As a 
continuation of the example above, the status of nz is set to GHOST because the 
tree contains node ng with a higher round number and the status of ng is set to 
ADDED. 

Moreover, the addition of n can “invalidate” some other nodes, turn their 
status to GHOST. This is based on a notion of conflicting nodes. We say that 
two nodes are conflicting if they are on different branches, i.e., there is no path 
from one node to the other. An add invocation that adds a node n changes the 
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round = 1 round = 3 
value = v4 value = v2 


3 n 
P 
round = 2 
add(1, v1, 0) i add (3, v2, 0) value = v4 add(2, v1, 1) 


Fig. 2: Explaining the behavior of add and commit methods. Colors are inter- 
preted as in Fig 1. 


value = v4 commit(3) 


status of all the nodes n’ in the tree that conflict with n and have a lower round 
number than n, to GHOST. For example, Figure 2 pictures a sequence of QTree 
states in an execution, to be read from left to right. The first state represents 
the result of executing add(1,v1,0) on the initial state of QTree, adding node 
nı. Executing add(3,v2,0) on this first state creates another node ng and sets 
its status to ADDED. This invocation will also turn the status of nı to GHOST since 
its round number is less than the round number of n3 and they are on different 
branches. Afterwards, by executing add(2,v;,1), a node nz is added to the tree 
with status GHOST since there is a node ng on a different branch which has a 
higher round number. 

The method add returns OK when the created node is effectively added to 
the tree (it satisfies the conditions described above) and FAIL, otherwise. 

Lastly, the commit method takes a round number r as input and turns the 
status of the node containing r to COMMITTED if it was ADDED. If successful, it 
returns OK and FAIL, otherwise. As a continuation of the example above, the 
right part of Figure 2 pictures a state obtained by executing commit(3) on the 
state to the left. This sets the status of n3 to COMMITTED as n3 was previously 
ADDED. Note that the conditions in add ensure that the tree can not contain two 
nodes with the same round number. 


Safety Properties. We show that the QTree object in Algorithm 1 can be used 
to reason about the safety of single-decree consensus protocols, in the sense that 
it satisfies a notion of Validity (processes agree on one of the proposed values) 
and Agreement (processes decide on a single value). More precisely, we show that 
every state that is reachable by executing a sequence of invocations of add and 
commit (in Algorithm 1), called simply reachable state, satisfies the following: 


— Validity: every node different from Root contains the same value as a child 
of Root, and 

— Agreement: every two COMMITTED nodes different from Root contain the same 
value. 


Proposition 1 (Validity). Every node in a reachable state that is different 
from Root contains the same value as a child of Root. 


Proof. A node n is added to the tree only if the predicate valueConstraint holds, 
which implies that it is either a child of Root or it has the same value as its 
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parent which is a descendant of Root. Also, since the value attribute of a node 
is immutable, any COMMITTED node contains the same value that it had when it 
was created by an add invocation. 


Therefore, the fact that a consensus protocol refining QTree satisfies validity, 
i.e., processes decide on a value proposed by a client of the protocol, reduces 
to proving that the phases of a round simulated by add invocations that add 
children of Root use values proposed by a client. This is ensured using additional 
mechanisms, i.e., a client broadcasts its value to all participants in the protocol, 
so that each participant can check the validity of a value proposed by a leader. 

Next, we focus on Agreement, and show that COMMITTED nodes belong to a 
single branch of the tree. 


Proposition 2. Let nı and no be two COMMITTED nodes in a reachable state. 
Then, nı and nz are not conflicting. 


Proof. Assume towards contradiction that QTree reaches a state where two 
COMMITTED nodes n; and nz are conflicting. Let rı = nı.round and rz = n2.round. 
Without loss of generality, we assume that rı < rg. Such a state is reachable if 
add(rı, _, _) and add(r2,_, _) resulted in adding the nodes n; and ng and set 
their status to ADDED (we use _ to denote arbitrary values), and subsequently, 
commit(r,) and commit(r2) switched the status of both nı and nz to COMMITTED. 


If add(rı, _, _) were to execute before add(rz2,_,_), then add(r2,__, _) would 
have changed the status of nı to GHOST because it is conflicting with ng. Other- 
wise, if add(rg,_, _) were to execute before add(r,,_, _) , then the latter would 


have set the status of nı to GHOST since the tree contains n2 that has a higher 
round number. In both cases, executing commit(r1) can never turn the status of 
nı to COMMITTED. 


Proposition 2 allows to conclude that any two COMMITTED nodes (different 
from Root) contain the same value. Indeed, a node can become COMMITTED only 
if it was ADDED, which implies that is has the same value as its parent (the 
predicate valueConstraint holds), and by transitivity, as any of its ancestors, 
except for Root. 


Proposition 3 (Agreement). Let nı and ng be two COMMITTED nodes in a 
reachable state, which are different from Root. Then, n: .value = ng.value. 


2.3 State Machine Replication Versions 


The single-decree version described above can be extended easily to a multi- 
decree context. As multi-decree consensus protocols, used in state machine repli- 
cation, can be seen as a composition of multiple instances of single-decree consen- 
sus protocols, a multi-decree version of QTree is obtained by composing multiple 
instances of the single-decree version. Each of these instances manipulates a tree 
as described above without interference from other instances. The validity and 
agreement properties above apply separately to each instance. 

The single-decree version can also be extended for state machine replica- 
tion protocols like HotStuff and Raft where the commands (values) are a-priori 
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structured as a tree, i.e., each command given as input is associated to a pre- 
determined parent in this tree. Then, the goal of such a protocol is to agree 
on a sequence in which to execute these commands, i.e., a branch in this tree. 
Simply removing the valueConstraint condition in the add method (underlined 
in Algorithm 1) enables QTree to simulate such protocols. A node’s value need 
not be the same as its parent’s value to be valid for add. Proposition 2 that 
implies the agreement property of such protocols still holds (Proposition 3 does 
not hold when the valueConstraint condition is removed; this property is specific 
to single-decree consensus). Since the value field remains immutable, the validity 
property of such protocols reduces to ensuring that the values generated during 
phases simulated by add correspond to commands issued by the client (Proposi- 
tion 1 is also specific to single-decree consensus and it does not hold). As before, 
this requires additional mechanisms, i.e., a client broadcasting a command to 
all the participants in the protocol, whose correctness can be established quite 
easily. 


3 Consensus Protocols Refining QTree 


In the following, we show that a number of consensus protocols are refinements of 
QTree in the sense that their executions can be mimicked with add and commit 
invocations. This is similar to a linearizable concurrent object being mimicked 
with invocations of a sequential specification. The refinement relation allows to 
conclude that the Validity and Agreement properties of QTree imply similar 
properties for any of its refinements. 

The definition of the refinement relation relies on a formalization of protocols 
and QTree as labeled transition systems. For a given protocol, a state is a tuple of 
process local states and a set of messages in transit, and a transition corresponds 
to an indivisible step of a process (receiving a set of messages, performing a local 
computation step, or sending a message). For QTree, a state is a tree of nodes 
as described above and a step corresponds to an invocation to add or commit. 
An execution is a sequence of transitions from the initial state. 

Refinement corresponds to a mapping between protocol executions and QTree 
executions. This mapping is defined as in proofs of linearizability for concurrent 
objects with fired linearization points, where the goal is to show that each con- 
current object method appears to take effect instantaneously at a point in time 
that corresponds to executing a fixed statement in its code. Therefore, certain 
steps of a given protocol are considered as linearization points of add and commit 
QTree invocations (returning OK), and one needs to prove that the sequence of 
invocations defined by the order of linearization points in a protocol execution 
is a correct execution of QTree. 

Formally, a labeled transition system (LTS) is a tuple L = (Q, qo, T, AL) 
where Q is a set of states, go is the unique initial state, Az is a set of actions 
(transition labels) and 7 is a set of transitions (q,a,q’) such that g,q’ € Q 
and a € Ay. An execution E from qo is a finite sequence of alternating states 
and actions such that E = qo, ao, q1, 41,---;Gn With (qi, ai, qi+1) E T for each 
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0<i<n-1. A trace tis the sequence of actions projected from some execution 
E. T(L) denotes the set of traces of L. 

The standard notion of refinement between LTSs states that an LTS L is a 
refinement of another LTS L’ when T(L) C T(L’). In this paper, we consider a 
slight variation of this definition of refinement that applies to LTSs that do not 
share the same set of actions, representing for instance, some concrete protocol 
and QTree, respectively. This notion of refinement is parametrized by a mapping 
T between actions of L and L’, respectively. We say that L [’-refines L’ when 
I(T(L)) C T(L’). Here, a mapping I : Ar —> Az, is extended to sequences 
and sets of sequences as expected, e.g., (a1 ... an) = I'(a1)... (ay). With this 
extension, the preservation of safety specifications from an LTS to a refinement 
of it requires certain constraints on the mapping J’ that will be discussed in 
Section 4.2. 

In the context of proving that a concrete protocol refines QTree, the goal is 
to define a mapping I” between actions of the protocol and QTree add/commit 
invocations such that I” applied to protocol executions results in correct QTree 
executions. In the following, we provide a characterization of correct QTree ex- 
ecutions that simplifies such refinement proofs. 


3.1 Characterizing QTree Invocation Sequences 


An invocation label add(r,v,rp) = RET or commit(r) = RET combines a 
QTree method name with input values and a return value RET € {OK, FAIL}. 
An invocation label is called successful when the return value is OK. A sequence 
a of invocation labels is called correct when there exist QTree states qo, -.-, Qo; 
such that qo is the QTree initial state and for each 7 € [1,|o|], executing c; 
starting from q;_; leads to qi. 


Theorem 1. A sequence o of successful invocation labels is correct if and only 
if the following hold (we use _ to denote arbitrary values): 


1. for every r, o contains at most one invocation label add(r,_,_) and at most 
one invocation label commit(r) 

2. every commit(r) is preceded by an add(r,_,_) 

3. if fp > 0, every add(r,v,rp) is preceded by add(rp,v',_) where O < rp < r 
(a) andv=v' 

4. ifo contains add(r,_,_) and add(r’,_,r”) with r” <r < r’, then o does 
not contain commit(r) 


Properties 1-3 are straightforward consequences of the add and commit defini- 
tions. Indeed, it is impossible to add two nodes with the same round number r, 
which implies that there can not be two successful add(r, _, _) invocations, the 
status of a node can be flipped to COMMITTED exactly once, which implies that 
there can not be two successful commit(r) invocations, and a commit(r) is suc- 
cessful only if a node with round number r already exists, hence Property 2 must 
hold. Moreover, a node’s parent defined by the input r, must already exist in the 
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tree, which implies that Property 3 must also hold. Property 4 is more involved 
and relies on the fact that a node n with round number r can be COMMITTED only 
if there exist no other conflicting node n’ with a bigger round number r’ (the 
parent of n’ having a round smaller than r implies that n and n’ are conflicting). 


Proof. (=): Assume that ø is correct. We show that it satisfies the above prop- 
erties: 


— Property 1: The newRound(n) predicate used at line 8 in Algorithm 1 en- 
sures that it is impossible to add two nodes with the same round number r, 
and therefore ø can not contain two successful add(r, _, _) = OK invoca- 
tions. The conditions at line 19 ensure that commit(r) = OK can flip the 
status of a node only once, and therefore only one such successful invocation 
can occur in v. 

— Property 2: The conditions at line 19 in Algorithm 1 imply that the state 
in which commit(r) = OK is executed contains a node with round num- 
ber r. This node could have only added by a previous add(r,_,_) > OK 
invocation. 

— Property 3: The link(n) predicate used at line 8 in Algorithm 1 ensures 
that the state in which add(r,v,r,) = OK is executed contains a node 
with round number r,. This node could have only added by a previous 
add(rp,v', _) = OK invocation, for some v’. 

e Property 3a: It is a direct consequence of the valueConstraint(n) pred- 
icate used at line 8 in Algorithm 1. 

— Property 4: Let n and n’ be the nodes of the QTree state q reached after exe- 
cuting o, which have been added by add(r,_, _) = OK and add(r’,_,r”") > 
OK, respectively. We have that n’.round > n.round > n.parent.round. Since 
the round numbers decrease when going from one node towards Root in a 
reachable QTree state, it must be the case that n and n’ are conflicting. By 
Lemma 1, we get that n.status is GHOST. Since the GHOST status can not 
be turned to COMMITTED and vice-versa, it follows that o can not contain 
commit(r) > OK. 


(<): We prove that every sequence o that satisfies properties 1—4 is correct. We 
proceed by induction on the size of ø. The base step is trivial. For the induction 
step, let o be a sequence of size k + 1. If o satisfies properties 1-4, then the 
prefix o’ containing the first k labels of o satisfies properties 1-4 as well. By 
the induction hypothesis, a’ is correct. We show that the last invocation of øg, 
denoted by o%+1 can be executed in the QTree state qjo’| reached after executing 
o’. We start with a lemma stating an inductive invariant for reachable QTree 
states: 


Lemma 1. For every node n in any state q reached after executing a correct 
sequence o of successful invocations, n.status is COMMITTED if n is Root or o 
contains a commit(r) invocation. Else, n.status is GHOST if q contains a node n’ 
with n'.round > n.round and n’ is conflicting with n, and it is ADDED, otherwise. 
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Proof. We proceed by induction on the size of o. The base step is trivial. For 
the induction step, let o be a sequence of size m+1. Let qm be the state reached 
after executing the prefix of size m of ø, and let om+41 be the last invocation 
label of ø. We show that the property holds for any possible o,,41 that takes 
the QTree state qm to some other state qm+1: 


— Om41 = add(r,v,rp) > OK, for some r, v, rp: Let n be the new node 
added by this invocation. The status of n can be ADDED or GHOST. If qm 
contains a node n’ with n’.round > r (since round numbers are decreasing 
going towards the Root and n is a new leaf node, any existing node with a 
higher round number such as n’ is also conflicting with n), then the status 
of n becomes GHOST by the predicate at line 11 in Algorithm 1 (otherwise, 
it remains ADDED). This implies that n’s status satisfies the statement in the 
lemma. This invocation may also turn the status of some set of nodes N 
from ADDED to GHOST by the statement at line 13 in Algorithm 1. The nodes 
in N have a lower round number than r and conflicting with n. Therefore, 
the statement of the lemma is satisfied for the nodes in N. 

— Om41 = commit(r) > OK, for some r: For commit(r) to be successful the 
conditions at line 19 in Algorithm 1 must be satisfied. If it is satisfied, only 
the status of node n is changed from ADDED to COMMITTED. Note that Root 
exists by definition and its status is COMMITTED. Since the statuses of the rest 
of the nodes stay the same, the statement of the lemma holds. 


There are two cases to consider depending on whether 0,41 is an add or 
commit invocation label: 


— add(r,v,rp): This invocation label is successful if and only if the predicates 
valid(n) and valueConstraint(n) at line 8 in Algorithm 1 are satisfied after 
generating a new node n with the given inputs in the state qo]: 

e newRound(n): Due to Property 1, r 4 n’.round for any other node 
n’ € qo| and the predicate is satisfied. 

e link(n): To satisfy this predicate, there must exist a node in qj,’ with 
round rp where r, < r. By Property 3, if o contains add(r, _, rp) > OK 
with rp # 0, then add(rp,_, __) = OK also exists in ø. Hence, there 
exists a node p with round rp in qj,7;, and the predicate is satisfied. If 
rp = 0, then qo’; contains the Root node (with round 0) which ensures 
that the predicate is satisfied. 

e extendsTrunk(n): This predicate states that n extends the node n’ 
which has the highest round number among the nodes with COMMITTED 
status, if n.round > n/.round. Assume by contradiction that this is not 
the case, i.e., n.round > n’.round but n and n’ are conflicting. Let nı be 
the lowest common ancestor of n and n’ (the first common node on the 
paths from n and n’ to the Root). Since the round numbers decrease when 
going from one node towards Root, we have that n1.round < n’.round. 
If we consider the nodes on the path from n to nı, since n.round > 
n’.round, there must exist a node ng such that nz.round > n’.round 
but n2-parent.round < n’.round. The node ng in qi”; corresponds to the 
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invocation label add(nz.round,_,n2.parent.round) in g’. Moreover, the 
COMMITTED status of n’ implies the existence of commit(n’.round) in o’ 
as stated in Lemma 1. However, it is impossible that o’ contains both 
these invocation labels if Property 4 holds. 

e valueConstraint(n): It is implied trivially as Property 3a holds. 

— commit(r): It is successful if and only if the conditions at line 19 in Algo- 
rithm 1 are satisfied. Then by Property 1 and 2, there exist add(r,_,_) 
in o’ but not commit(r). As add(r,_, _) is successful, there already exist 
a node n in qos) where its round is r but its status can be either ADDED or 
GHOST. Towards a contradiction, assume that n.status = GHOST in q,7). This 
means that there exists a node n’ conflicting with n such that n’.round > 
n.round as stated in Lemma 1. Let nı be the least common ancestor of n and 
n’. Since round numbers are decreasing going towards the Root, n1.round < 
n.round. If we consider nodes on the path from n’ to nj, there exists a node 
ng such that ng.round > n.round and ng.parent.round < n.round. That’s 
why, there is an invocation label add(ng.round, _,n2.parent.round) in o’. 
However, ø cannot contain both of these invocation labels together according 
to Property 4. 


4 Linearization Points 


We describe an instrumentation of consensus protocols with linearization points 
of successful QTree invocations, and illustrate it using Paxos as a running ex- 
ample. Section 5 and Section 6 will discuss other protocols like HotStuff, Raft, 
PBFT, and multi-Paxos. This instrumentation defines the mapping I’ between 
actions of a protocol and QTree, respectively, such that the protocol is a T- 
refinement of QTree. We also discuss the properties of this instrumentation which 
imply that establishing I -refinement is an effective proof for the safety of the 
protocol. 

The identification of linearization points relies on the fact that protocol exe- 
cutions pass through a number of rounds, and each round goes through several 
phases (rounds can run asynchronously — processes need not be in the same 
round at the same time). The protocol imposes a total order over the phases 
inside a round and among distinct rounds. Processes executing the protocol can 
only move forward following the total order on phases/rounds. Going from one 
phase to the next phase in the same round is possible if a quorum of processes 
send a particular type of message. The refinement proofs require identifying two 
quorums for each round where a value is first proposed to be agreed upon and 
then decided. They correspond to linearization points of successful add(r, _, _) 
and commit(r), respectively. The linearization point of add(r,v,r,) > OK oc- 
curs when intuitively, the value v is proposed as a value to agree upon in round 
r. For the protocols we consider, v is determined by a designated leader after 
receiving a set of messages from a quorum of processes. For single-decree con- 
sensus, members of the quorum send the latest round number and value they 
adopted (voted) in the past and the leader picks a value corresponding to the 
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maximum round number rp. If no one in the quorum has adopted any value yet, 
then the leader is free to propose any value received from a client, and rp equals 
a default value 0. For state-machine replication protocols like HotStuff or Raft, 
the round rp is defined in a different manner — see Section 5 (and the full version 
of this work [9]). The linearization point of commit(r) = OK occurs when a 
quorum of nodes adopt (vote for) a value v proposed at round r. 

By Theorem 1, proving that the order between linearization points along a 
protocol execution defines a correct QTree execution reduces to showing Prop- 
erties 1—4. In general, Properties 1-3 are quite straightforward to establish and 
follow from the control-flow of a process. Property 3a is specific to single-decree 
consensus protocols or compositions thereof, e.g., (multi-)Paxos and PBFT. It 
will not hold for Raft or Hotstuff. Property 4 is related to the fact that any two 
quorums of processes intersect in a correct process. 

Above, we have considered the case of a protocol that is a refinement of a 
single instance of QTree. State machine replication protocols that are composed 
of multiple independent consensus instances, e.g., PBFT (see Section 6), are 
refinements of a set of QTree instances (identified using a sequence number) and 
every linearization point needs to be associated with a certain QTree instance. 


4.1 Linearization Points for Paxos 


For concreteness, we exemplify the instrumentation with linearization points 
on the single-decree Paxos protocol. We start with a brief description of this 
protocol that focuses on details relevant to this instrumentation. 

Paxos proceeds in rounds and each round has a unique leader. Since the set 
of processes running the protocol is fixed and known by every process, the leader 
of each round can be determined by an a-priorly fixed deterministic procedure 
(e.g., the leader is defined as r mod N where r is the round number and N the 
number of processes). For each round, the leader acts as a proposer of a value 
to agree upon. 

A round contains two phases. In the first phase, the leader broadcasts a 
START message to all the processes to start the round, executing the START 
action below, and processes acknowledge with a JOIN message if some conditions 
are met, executing the JOIN action: 


e START Action: The leader p of round r > 0 (the proposer) broadcasts a 
START(r) message to all processes. 

e JOIN Action: When a process p’ receives a START(r) message, if p’ has not 

sent a JOIN or VOTE message (explained below) for a higher round in the 

past”, it replies by sending a JOIN(r) message to the proposer. This message 

includes the maximum round number (mazVotedRound) for which p’ has sent 

a VOTE message in the past and the value (maz Voted Value) proposed in that 


round. If it has not voted yet, these fields are 0 and L. 


5 Each process has a local variable maxJoinedRound that stores the maximal round 
it has joined or voted for in the past and checks whether mazJoinedRound < r 
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If the leader receives JOIN messages from a quorum of processes, i.e., at least 
f+1 processes from a total number of 2f +1, the second phase starts. The leader 
broadcasts a PROPOSE message with a value, executing the PROPOSE action 
below. Processes may acknowledge with a VOTE message if some conditions are 
met, executing a VOTE action. If the leader receives VOTE messages from a 
quorum of processes, then the proposed value becomes decided (and sent to the 


client) by executing a DECIDE action: 


e PROPOSE Action: When the proposer p receives JOIN(r) messages from a 
quorum of (f + 1) processes, it selects the one with the highest vote round 
number and proposes its value by broadcasting a PROPOSE(r) message (which 
includes that value). If there is no such highest round (all vote rounds are 
0), then the proposer selects the proposed value randomly simulating a value 
given by the client (whose modeling we omit for simplicity). 

e VOTE Action: When a process p’ receives a PROPOSE(r) message, if p’ has 
not sent a JOIN or VOTE message for a higher round in the past, it replies 

by sending a VOTE(r) message to the proposer with round number r. 

e DECIDE Action: When the proposer p receives VOTE(r) messages from a quo- 
rum of processes, it updates a local variable called decided Val to be the value 
it has proposed in this round r. This assignment means that the value is de- 
cided and sent to the client. 


Linearization points in Paxos. We instrument Paxos with linearization points 
as follows: 


— the linearization point of add(r,v,r’) = OK occurs when the proposer 

broadcasts the PROPOSE(r) message containing value v after receiving a 
quorum of JOIN(r) messages (during the PROPOSE action in round r). The 
round r’ is extracted from the JOIN(r) message selected by the proposer. 

— the linearization point of commit(r) = OK occurs when the leader of round 
r updates decided Val after receiving a quorum of VOTE(r) messages (during 


the DECIDE Action). 


We illustrate the definition of linearization points for Paxos in relation to QTree 
executions in the full version [9]. 


Theorem 2. Pazos refines QTree. 


Proof. We show that the sequence of successful add and commit invocations 
defined by linearization points along a Paxos execution satisfies the properties 
in Theorem 1 and therefore, it represents a correct QTree execution: 


— Property 1: Each round has a unique leader and the leader follows the rules 
of the protocol (no Byzantine failures), thereby, making a single proposal. 
Therefore, the linearization point of an add(r,_,_) = OK will occur at 
most once for a round r. Since a single value can be proposed in a round, 
and all processes follow the rules of the protocol, they can only vote for that 
single value. Thus, at most one linearization point of commit(r) > OK can 
occur for a round r. 
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— Property 2: This holds trivially as all the processes follow the rules of the 
protocol and they need to receive a PROPOSE(r) message (which can occur 

only after the linearization point of an add(r, _, _) = OK) from the leader 

of round r to send a VOTE(r) message. 

— Property 3: By the definition of the PROPOSE action, the proposer selects 
a highest vote round number r’ from a quorum of JOIN(r) messages that 


it receives, before broadcasting a PROPOSE(r) message. If such a highest 

vote round number r’ > 0 exists, then there must be a VOTE(r’) message 
which is a reply to a PROPOSE(r’) message. Thus, if the linearization point of 
add(r,_,r’) = OK occurs where r’ Æ 0, then it is preceded by add(r’, _, _ ). 
Also, by the definition of JOIN, a process can not send a JOIN(r) message 
after a VOTE(r’) message if r # r’. 

e Property 3a: By the definition of PROPOSE, the proposer selects the 
JOIN message with the highest vote round number and proposes its 
value. Thus, if the linearization points of both add(r,v,r’) > OK and 
add(r’,v’, _) = OK occur, then v = v. 

— Property 4: Assume by contradiction that the linearization point of commit 
(r) > OK occurs along with the linearization points of add(r, _, _) > OK 
and add(r’,_,r”) = OK, for some r” < r < r’. The linearization point of 
commit(r) occurs because of a quorum of VOTE(r) messages sent by a set 

of processes P,, and add(r’, _,r”) occurs because of a quorum of JOIN(r’) 

messages sent by a set of processes P. Since P) and P) must have a non- 

empty intersection, by the definition of JOIN, it must be the case that r” > r, 

which contradicts the hypothesis. 


The proof of Property 4 relies exclusively on the quorum of processes in 
the first phase of a round intersecting the quorum of processes in the second 
phase of a round. It is not needed that quorums in first, resp., second, phases 
of different rounds intersect. This observation is at the basis of an optimization 
that applies to non-Byzantine protocols like Flexible Paxos [18] or Raft (see the 
full version [9]). 


4.2 Inferring Safety 


The main idea behind these linearization points is that successful add and 
commit invocations correspond to some process doing a step that witnesses for 
the receipt a quorum of messages sent in a certain phase of a round. Intuitively, 
linearization points of successful add invocation occur when some process in 
some round is certain that a quorum of processes received or will receive the 
same proposal (same value, parent etc.) for the same round and acts accordingly 
(sends a message). Such proposal on a value v in a round r is denoted by the 
linearization point of successful add(r,v,r’) for some r’. On the other hand, the 
linearization point of a successful commit(r) invocation occurs when a process 
decides on a value in round r (e.g., after receiving a quorum of votes). Formally, 
if we denote the actions of a protocol that correspond to linearization points of 
successful add(r,v,7r’) and commit(r) invocations using aq and ac, respectively, 
then I'(a,) = add(r,v,r’) > OK and T (ac) = commit(r) > OK. 
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When the protocol is such a J’-refinement of QTree, then, it satisfies agree- 
ment and validity. If a decision on a value v in a round r of a protocol is the 
linearization point of a successful commit(r), then by Theorem 1, the corre- 
sponding QTree state contains a node n with n.round = r, n.value = v, and 
n.status = COMMITTED. For single-decree consensus, Proposition 3 ensures that 
all rounds decide on the same value. For state machine replication protocols 
like Raft and HotStuff, where the goal is to agree on a sequence of commands, 
Proposition 2 ensures that all the decided values lie on the same branch of the 
tree which ensures that all processes agree on the same sequence of commands. 

For validity, when valueConstraint(n) is considered, successful add(r,v, 0) 
invocations represent proposals of client values. Theorem 1 ensures that these 
invocations correspond to nodes n that are immediate children of Root and for 
any such node n, n.value = v. Therefore, by Proposition 1, we can conclude that 
only client values can be decided. When valueConstraint(n) is not considered, 
the fact that the value of each node is obtained from a client is ensured using 
additional mechanisms that are straightforward, e.g., a client broadcasting a 
command to all the participants in the protocol. 


5 HotStuff Refines QTree 


We present an instrumentation of HotStuff with linearization points of successful 
add and commit invocations. We use HotStuff as an example of a state machine 
replication protocol where processes agree over a sequence of commands to exe- 
cute, and any new command proposed by a leader to the other processes comes 
with a well-identified immediate predecessor in this sequence. Agreement over 
a command entails agreement over all its predecessors in the sequence. This is 
different from protocols such as multi-Paxos or PBFT, discussed in the next 
section, where commands are associated to indices in the sequence and they can 
be agreed upon in any order. Instrumentation of Raft is presented in the full 
version [9] and behaves in a similar manner. 

In HotStuff, f out of a total of N = 3f +1 processes might be Byzantine 
in the sense that they might show arbitrary behavior and send corrupt or spu- 
rious messages. However, they are limited by cryptographic protocols. HotStuff 
requires that messages are signed using public-key cryptography, which implies 
that Byzantine processes cannot imitate messages of correct (non-faulty) pro- 
cesses. Additionally, after receiving a quorum of messages, leaders must include 
certificates in their own messages to prove that a quorum has been reached. 
These certificates are constructed using threshold signature schemes and correct 
processes will not accept any message from the leader if it is not certified. Be- 
cause of Byzantine processes, HotStuff requires quorums of size of 2f + 1 which 
ensures that the intersection of any two quorums contains at least one correct 
process. 

Each process stores a tree of commands. When a node in this tree (represent- 
ing some command) is decided, all the ancestors of this node in the tree (nodes 
on the same branch) are also decided. For a node to become decided, a leader 
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must receive a quorum of messages in 3 consecutive phases after the proposal. 
After each quorum is established, the leader broadcasts a different certificate to 
state which quorum has been achieved and the processes update different local 
variables accordingly, with the same node (if the certificate is valid). These local 
variables are preNode, votedNode and decidedNode in the order of quorums. 

To start a new round, processes send their preNode’s to the leader of the 
next round in ROUND-CHANGE (r) messages and increment their round number. 
After getting a quorum of messages and selecting the preNode with the highest 
round, the leader broadcasts a PROPOSE (r) message with a new node (value 
is taken from the client) whose parent is the selected preNode. When the mes- 
sage is received by a process, it first checks if the new node extends the selected 
preNode. Then it accepts the new node if the node extends its own votedNode (it 
is a descendant of votedNode in the tree) or it has a higher round number than 
the round number of its votedNode, and sends® a JOIN (r) message with the 
same content. In the second (resp., third) phase, if a quorum of JOIN (r) (resp., 
PRECOMMIT_VOTE (r) ) messages is received by the leader, it broadcasts a PRE- 
COMMIT (r) (resp., COMMIT (r) ) message, and processes update their preNode 
(resp., votedNode) with the new node, sending a PRECOMMIT_VOTE (r) (resp., 
COMMIT_VOTE (r)) message. In the fourth phase, when the leader receives a 
quorum of COMMIT_VOTE (r), it broadcasts a DECIDE (r) message and pro- 
cesses update their decidedNode accordingly. See the full version [9] for more 
details. 

For HotStuff, the linearization points of add and commit occur with the 
broadcasts of PRECOMMIT(r) and DECIDE(r) messages, respectively, that are 
valid , i.e., (1) they contain certificates for quorums of JOIN(r) or COM- 
MIT_VOTE(r) messages, respectively, which respect the threshold signature 
scheme, and (2) they contain the same node as in those messages. More pre- 


cisely, 


— the linearization point of add(r,v,r’) = OK occurs the first time when a 
valid PRECOMMIT(r) message containing node v is sent. r’ is the round of the 
node which is the parent of v and it is contained in a previous PROPOSE (r) 
message (r’ can be 0 in which case parent of v is a distinguished root node 
that exists in the initial state). 

— the linearization point of commit(r) = OK occurs the first time when a 
valid DECIDE(r) message is sent. 


Note that a Byzantine leader can send multiple valid PRECOMMIT(r) messages 
that include certificates for different quorums of JOIN (r) messages. A lineariza- 
tion point occurs when the first such message is sent. Even if processes reply to 
another valid PRECOMMIT(r) message sent later, this later PRECOMMIT(r) mes- 
sage contains the same preN ode value, and their reply will have the same content. 
The same holds for DECIDE(r) messages. This remark along with the restriction 


6 For all received messages, a correct process also checks if the round number of the 
node sent by the leader is equal to the current round number of its own, and can 
send only one message for each phase in each round. 
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to valid messages and the fact that any two quorums intersect in at least one 
correct process implies that the sequence of successful add and commit invoca- 
tions defined by these linearization points satisfies the properties in Theorem 1 
and therefore, 


Theorem 3. HotStuff refines QTree. 


A detailed proof of the theorem above is given in the full version [9]. 


6 PBFT Refines QTree 


The protocols discussed above are refinements of a single instance of QTree. 
State-machine replication protocols based Multi-decree consensus like Multi- 
Paxos or PBFT can be seen as compositions of a number of single-decree con- 
sensus instances that run concurrently, one for each index in a sequence of com- 
mands to agree upon, and they are refinements of a set of independent QTree 
instances. We describe the instrumentation of PBFT and delegate multi-Paxos 
(and variants) to the full version [9]. 

PBFT is a multi-decree consensus protocol in which processes aim to agree 
on a sequence of values. As in HotStuff, f out of a total number of 3f +1 
processes might be Byzantine and quorums are of size at least 2f +1. To ensure 
authentication, messages are signed using public-key cryptography. Messages 
sent after receiving a quorum of messages in a previous phase include that set 
of messages as a certificate. 

A new round r starts with the leader receiving a quorum of ROUND-CHANGE(r) 
messages (like in HotStuff). Each such message from a process p includes the 
VOTE message with the highest round (similarly to the JOIN action of Paxos) 
that p sent in the past, for each sequence number that is not yet agreed by 
a quorum. For an arbitrary set of sequence numbers sn, the leader selects the 
VOTE message with the highest round and broadcasts a PROPOSE(r,sn) message 
that includes the same value as in the VOTE message or a value received from a 
client if there is no such highest round. As mentioned above, this message also 
includes the VOTE messages that the leader received as a certificate for the selec- 
tion. When a process receives a PROPOSE(r,sn) message, if r equals its current 
round, the process did not already acknowledge a PROPOSE(r,sn) message, and 
the value proposed in this message is selected correctly w.r.t. the certificate, then 
it broadcasts a JOIN(r,sn) message with the same content (this is sent to all 
processes not just the leader). If a quorum of JOIN(r,sn) messages is received 
by a process, then it broadcasts a VOTE(r,sn) message with the same content. 
If a process receives a quorum of VOTE(r,sn) messages, then the value in this 
message is decided for sn. When a process sends its highest round number VOTE 
messages to the leader of the next round (in ROUND-CHANGE messages), it also 
includes the quorum of JOIN messages that it received before sending the VOTE, 
as a certificate. 

PBFT is a refinement of a set of independent QTree instances, one instance 
for each sequence number. The linearization points will refer to a specific instance 
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identified using a sequence number, e.g., sn.add(r,v,r’) denotes an add(r, v, r’) 
invocation on the QTree instance sn. Therefore, 


— the linearization point of sn.add(r,v,r’) = OK occurs the first time when a 
process p sends a VOTE(r, sn) message, assuming that p is honest, i.e., it al- 
ready received a quorum of JOIN(r, sn) messages with the same content. v is 
the value of the VOTE(r’, sn) message that is included in the PROPOSE(r,sn) 
message (it is possible that r’ = 0 and v is selected randomly). 

— the linearization point of sn.commit(r) = OK occurs the first time when a 
process p decides a value for sn, assuming that p is honest, i.e., it already 
received a quorum of JOIN(r, sn), resp., VOTE(r, sn), messages with the 
same content. 


A protocol refines a set of QTree instances identified using sequence numbers 
when it satisfies Properties 1-4 in Theorem 1 for each sequence number, e.g., 
Property 1 becomes for every sn and every r, a protocol execution contains a 
linearization point for at most one invocation sn.add(r,_,_) = OK and at 
most one invocation sn.commit(r) = OK. A detailed proof of the following 
theorem is given in the full version [9]. 


Theorem 4. PBFT refines a composition of independent QTree instances. 


7 Discussion 


Protocols considered in this work can be grouped under three classes: single- 
decree consensus (Paxos), multi-decree consensus (PBFT, Multi-Paxos) and 
state machine replication (Raft, HotStuff)’. We show that they all refine QTree: 
a single instance for Paxos and HotStuff, and a set of independent instances 
(one for each sequence number in a command log) for PBFT, Multi-Paxos, and 
Raft. The more creative parts of the refinement proofs are the identification of 
add and commit linearization points and establishing Property 4 in Theorem 1 
which follows from the intersection of quorums achieved in different phases of a 
round. The other 3 properties in Theorem 1 which guarantee that the lineariza- 
tion points are correct are established in a rather straightforward manner, based 
on the control-flow of a process participating to the protocol. 

The linearization points of successful add and commit invocations correspond 
to some process doing a step that witnesses for the receipt a quorum of messages 
sent in a certain phase of a round, e.g., the leader broadcasting a PROPOSE(r) 
message in Paxos entails that a quorum of JOIN(r) messages have been sent in 
the first phase and received. Protocols vary in the total number of phases in a 
round, and the phases for which quorums of sent messages should be received in 
order to have a linearization point of add or commit. A summary is presented in 
Table 1. The * on the total number of phases means that the first phase is skipped 
in rounds where the leader is stable. For Multi-Paxos and Raft, if the first phase 


T This is a slight abuse of terminology since multi-decree consensus protocols are 
typically used to implement state machine replication. 
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is skipped, then the linearization point of an add is determined by a quorum of 
received messages sent in the next phase (and coincides with the linearization 
point of a commit). We use “1/2” to denote this fact. In PBFT and HotStuff, 
due to Byzantine processes, quorums of messages sent in two consecutive phases 
need to be received in order to ensure that the processes are going to vote on 
the same valid proposal. The 3rd phase in HotStuff is used to ensure progress 
and can be omitted when reasoning only about safety. 


Table 1: Summary of linearization point definitions. For each protocol, we give 


the total number of phases in a round and the number of the phase for which 
a quorum of sent messages should be received in order to have a linearization 
point of add or commit. 
Class Protocol |#Phases|add Quorum Pha.|}commit Quorum Pha. 
Single-Decree Cons. Paxos 2 1 2 
: Multi-Paxos}| 2* 1/2 2 
Multi-Decree Cons. PBFT 3* 2 3 
, Raft 2* 1/2 2 
State Machine Repl. HotStuff 4 2 4 


8 Conclusion and Related Work 


We have proposed a new methodology for proving safety of consensus or state- 
machine replication protocols, which relies on a novel abstraction of their dy- 
namics. This abstraction is defined as a sequential QTree object whose state rep- 
resents a global view of a protocol execution. The operations of QTree construct 
a tree structure and model agreement on values or a sequence of state-machine 
commands as agreement on a fixed branch in the tree. Our methodology applies 
uniformly to a range of protocols like (multi-)Paxos, HotStuff, Raft, and PBFT. 
We believe that this abstraction helps in improving the understanding of such 
protocols and writing correct implementations or optimizations thereof. 

As a limitation, it is not clear whether QTree applies to protocols such as 
Texel [31] which do not admit a decomposition in rounds. As future work, we 
might explore the use of QTree in reasoning about liveness. This would require 
some fairness condition on infinite sequences of add/commit invocations, and a 
suitable notion of refinement which ensures that infinite sequences of protocol 
steps cannot be mapped to infinite sequences of stuttering QTree steps. 

The problem of proving the correctness of such protocols has been studied in 
previous work. We give an overview of the existing approaches that starts with 
safety proof methods based on refinement, which are closer to our approach. 


Refinement based safety proofs. Verdi [35] is a framework for implementing 
and verifying distributed systems that contains formalizations of various network 
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semantics and failure models. Verdi provides system transformers useful for re- 
fining high-level specifications to concrete implementations. As a case study, it 
includes a fully-mechanized correctness proof of Raft [36]. This proof consists 
of 45000 lines of proof code (manual annotations) in the Coq language for a 
5000 lines RAFT implementation, showing the difficulty of reasoning on consen- 
sus protocols and the manual effort required. Iron Fleet [17] uses TLA [22] style 
transition-system specifications and refine them to low-level implementations de- 
scribed in the Dafny programming language [25]. Boichat et al. [3] defines a class 
of specifications for consensus protocols, which are more abstract than QTree 
and can make correctness proofs harder. Proving Paxos in their case is reduced 
to a linearizability proof towards an abstract specification, which is quite com- 
plex because the linearization points are not fixed, they depend on the future of 
an execution. As a possibly superficial quantitative measure, their Paxos proof 
reduces to 7 lemmas that are formalized by Garcia-Perez et al. [12,13] in 12 
pages (see Appendix B and C in [13]), much more than our QTree proof. Our 
refinement proof is also similar to a linearizability proof, but the linearization 
points in our case are fixed (do not depend on the future of an execution) which 
brings more simplicity. In principle, the specifications in [3] could apply to more 
protocols, but we are not aware of such a case. The inductive sequentialization 
proof rule [20] is used for a fully mechanized correctness proof of a realistic 
Paxos implementation. This implementation is proved to be a refinement of a 
sequential program which is quite close to the original implementation, much 
less abstract than QTree, and relies on commutativity arguments implied by the 
communication-closed round structure [11]. A similar idea is explored in [14], 
but in a more restricted context. 


Inductive invariant based safety proofs. Ivy [30] is an SMT-based safety 
verification tool that can be used for verifying inductive invariants about global 
states of a distributed protocol. In order to stay in a decidable fragment of 
first-order logic, both the modeling and the specification language of IVY are 
restricted. A simple model of Paxos obeying these restrictions is proven correct 
in [29]. 

Beyond safety. The TLA+ infrastructure [22] of Lamport has been used to ver- 
ify both safety and liveness (termination) of several variations of Paxos, e.g., Fast 
Paxos [23] or Multi-Paxos [6]. Bravo et al. [4] introduce a generic synchronization 
mechanism for round changes, called the view synchronizer, which guarantees 
liveness for various Byzantine consensus protocols including our cases studies 
HotStuff and PBFT. This work includes full correctness proofs for single-decree 
versions of HotStuff and PBFT and a two-phase version of HotStuff. PSync [10] 
provides a partially synchronous semantics for distributed protocols assuming 
communication-closed rounds in the Heard-Of model [8]. PSync is used to prove 
both safety and liveness of a Paxos-like consensus protocol called last Voting. 


Relating different consensus protocols. Lamport defines a series of refine- 
ments of Paxos that leads to a Byzantine fault tolerant version, which is refined 
by PBFT [24]. Our proof that Paxos refines QTree can be easily extended to 
this Byzantine fault tolerant version in the same manner as we did for PBFT. 
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Wang et al. [34] shows that a variation of RAFT is a refinement of Paxos, which 
enables porting some Paxos optimizations to RAFT. Renesse et al. [32] compare 
Paxos, Viewstamped Replication [28] and ZAB [19]. They define a rooted tree of 
specifications represented in TLA style whose leaves are concrete protocols. Each 
node in this tree is refined by its children. Common ancestors of concrete pro- 
tocols show similarities whereas conflicting specifications show the differences. 
Similarly, [33] shows that Paxos, Chandra-Toueg [7] and Ben-Or [2] consensus 
algorithms share common building blocks. Aublin et al. [1] propose an abstract 
data type for specifying existing and possible future consensus protocols. Unlike 
our QTree, core components of this data type are not implemented and inten- 
tionally left abstract so that it can adapt to different network and process failure 
models. 
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Abstract. Multiparty Session Types (MPST) are a typing discipline 
for communication-centric systems, guaranteeing communication safety, 
deadlock freedom and protocol compliance. Several works have emerged 
which model failures and introduce fault-tolerance techniques. However, 
such works often make assumptions on the underlying network, e.g., as- 
suming TCP-based communication where messages are guaranteed to 
be delivered; or adopting centralised reliable nodes and ad-hoc notions 
of reliability; or only addressing a single kind of failure, such as node 
crashes. In this work, we develop MAGz—a Multiparty, Asynchronous 
and Generalised m-calculus, which is the first language and type system 
to accommodate in unison: (i) the widest range of non-Byzantine faults, 
including message loss, delays and reordering; crash and link failures; 
and network partitioning; (ti) a novel and most general notion of relia- 
bility, taking into account the viewpoint of each participant in the proto- 
col; (iii) a spectrum of network assumptions from the lowest UDP-based 
network programming to the TCP-based application level. We prove sub- 
ject reduction and session fidelity; process properties (deadlock freedom, 
termination, etc.); failure-handling safety and reliability adherence. 


Keywords: Session types - Distributed protocols - Failures - Timeouts 


1 Introduction 


Despite large investments into fault-prevention techniques, failures still regularly 
occur in complex distributed applications. It is widely agreed that traditional 
methods of verification using software testing do not provide high levels of confi- 
dence in the correctness of distributed algorithms. This is mainly due to the non- 
deterministic behaviour inherent to these protocols, which makes it unfeasible to 
manually test for all edge cases. This problem is bypassed by using exhaustive 
techniques such as model checking [9,31], capable of exploring the entirety of 
the state space of a program to verify its correctness. However, building suit- 
able models for complex distributed algorithms is arduous, expensive, and often 
intractable (due to the state explosion problem [10]). Furthermore, even if an 
algorithm is successfully encoded into a suitable model and checked, guarantees 
of correctness are on the design of the algorithm, and not on the software imple- 
mentation; handwritten code is still prone to human error. Contrastively, types 
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and type systems [29] are lightweight forms of verification. Baked in program- 
ming languages, types provide guarantees directly on handwritten code and aid 
developers in implementing software which is correct by construction. Specific 
to concurrent and distributed computing, session types [14,35,15,36,33,16] have 
quickly grown in popularity since their initial conceptualisation [14], spanning 
from binary-two participants, to multiparty—many participants. 


Session types enforce that processes communicate according to a protocol 
specification. Consequently, desirable properties about communication, e.g., type 
safety (communication occurs error-free), protocol compliance (or session fidelity; 
processes behave according to their predefined protocol), and deadlock free- 
dom (processes do not get stuck waiting), can be statically determined by a 
type checker. To this aim, session types have been implemented in various pro- 
gramming languages, including Java [18,11], Go [21], Haskell [17,27], Scala [32], 
Rust [19], Elixir [34]. 


To date, most session type theories are designed for concurrent, as opposed to 
distributed processes—1.e., it is commonly assumed that communication failures 
do not occur. For the few (and rapidly increasing) works that do consider failures, 
heavy assumptions are made that impede their viability for realistic complex dis- 
tributed applications. E.g., asynchronous theories [24,16,33] use message buffers 
to model distributed communication under “TCP-like” assumptions: messages 
are guaranteed to be delivered and messages from a single sender do not get re- 
ordered. Affine sessions [25,12,6] only allow failure-handling of application level 
failures through try-catch blocks; there is no support for arbitrary failures that 
may stem from hardware faults, network inconsistencies etc. Coordinator model 
approaches [1,8,37] assume some degree of reliability, be it as a central resilient 
process, a reliable broadcast, or fixed synchronisation points. 


The harsh reality is that many real-world distributed protocols (e.g., con- 
sensus algorithms) cannot assume any of these conditions. Networks introduce 
many points of failure into a system: nodes may crash, messages can be dropped, 
delayed or duplicated, links between nodes may fail etc. Designers of distributed 
protocols have acknowledged that failure is inevitable, and so algorithms are 
designed to withstand a threshold of failure whilst still achieving their expected 
behaviour—known as fault-tolerance [22]. Examples of fault-tolerant protocols 
(extensively) used today include the Paxos [20] and Raft [26] consensus algo- 
rithms, which assume the possibility of all non-Byzantine faults—i.e., node 
crashes, link failures, network partitions, and message inconsistencies. 


Although the correctness of these algorithms has been heavily studied, many 
of them are developed with limited confidence in the correctness of the deploy- 
able artifact, due to the reasons previously outlined. To fill this gap, we need 
type-based verification, which can be made available to programming languages, 
thus supporting designers and developers in designing and implementing cor- 
rect distributed algorithms. While (multiparty) session types have made great 
impact in modelling structured communication and guaranteeing relevant prop- 
erties, their theory is not yet expressive to model these complex algorithms. 
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In this paper, we take steps towards filling this gap by presenting MAGz—a 
Multiparty, Asynchronous and Generalised a-calculus—the first language and 
type system able to accommodate: (i) the widest range of non-Byzantine faults, 
including message loss, delays and reordering; crash and link failures; and net- 
work partitioning—all by using timeouts; (ii) a novel and most general notion of 
reliability, taking into account the viewpoint of each participant in the protocol; 
and (iii) a spectrum of network assumptions—from the lowest level of network 
programming based on UDP, to application level based on TCP. 


Example 1 (Ping Pong: Types). We illustrate MAGz with a simplified version 
of the ping utility from the Internet Control Message Protocol (ICMP*), which 
is our running example. The ping utility consists of a total of three roles commu- 
nicating amongst each other: two roles, p and r, communicate reliably with each 
other, and both communicate unreliably with a third role q. Our definition of re- 
liability (§ 3.2) takes into account the viewpoint of each role, thus allowing roles 
to have their own (possibly empty) reliability set. Following the assumptions 
above, the reliability set for p is {r}, for r is {p}, and for q is Q. 

Below we give the session types, denoted S,, Sp and Są for roles r, p and q 
respectively. 


S, = &{p?ok().end, p? ko().end} 


q? pong().r! ok().end, 
q? pong().r! ok().end, 


Sp = q ' ping(). & ©. q! ping(). & i q? pong().r!ok().end, 
Ct DINE Be { ©.r!ko().end 
p ? ping().p! pong()-end, 
5 =& p ? ping().p ! pong().end, 
a ©. & O. & { p ? ping().p! pong().end, 
` ©. end 


Role r is the receiver (&—called branching), which waits on two options: it receives 
from p either the label ok or ko and then it terminates the protocol (end). Role p is 
the sender (@?-called selection), and it tries to obtain information on the status of 
q. It begins by sending a ping message to q (q! ping()), then waits to receive from q. 
If a pong is received (q? pong()) in the top branch, then it concludes that the status 
of q is reachable and sends this information to r (r!ok()), after which it terminates. 
Alternatively, p enters a timeout branch (©). For simplicity, we assume p will attempt 
to communicate with q three times (shown in the three-time indentation of the timeout 
branch) before assuming q is unreachable; after which the session will also terminate 
by sending ko to r, followed by end. In the same lines, the protocol for role q is given 
by the session type Sg, where its timeout branches match the timeouts from Sp. 


1 nttps://www.rfc-editor.org/rfc/rfc792 
? For readability, we adopt a shorthand notation for sending towards a single role and 
for payloads of type unit, such that @{s!m(unit).S} is represented by s!m().S. 
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1.1 Contributions 


We now present our contributions w.r.t. our Multiparty, Asynchronous, and 
Generalised 7-calculus (MAG7). 


1. MAGz language (§ 2): 

— MAGz is the first language to support the widest set of non-Byzantine 
faults, including message loss, message delays and message reordering; 
crash failures and link failures; and network partitioning. 

— MAGz is the first language to introduce timeouts in receive branches 
(used for handling network failures), as well as support undirected 
branching in a generalised setting—the ability to simultaneously expect 
an incoming message from more than one sender. 

2. MAGz type system (§ 3): 

— is a conservative extension of a generalised asynchronous MPST the- 
ory [33], benefiting from: the ability to model more protocols than tradi- 
tional syntactic theories (e.g. global types); and the flexibility of checking 
desired properties, such as deadlock freedom or termination, a posteri- 
ori—as opposed to during the design phase. 

— supports undirected branching/selection and is the first type system 
to introduce timeout branches. 

— supports a novel and most general reliability definition (§ 3.2), taking 
into account the viewpoint of each participating role, and is built on 
optional role-dependant reliability assumptions. 

3. Type properties (§ 4): We prove subject reduction (theorem 1) and session 
fidelity (theorem 2). We show failure-handling safety (cor. 1) and its inverse, 
reliability adherence (cor. 2), which strictly connect timeouts and reliability. 
We prove process properties (theorem 3) e.g. deadlock-freedom, termination, 
liveness, in line with [33]. Finally, as our MAGza type system is Turing- 
complete, we prove decidable type checking (theorem 4) and decidability of 
process properties for finite message buffers (theorem 5). 

4. TCP vs. UDP (§ 5): MAGz is expressive enough to capture a range of 
network assumptions: from low-level network programming operating over 
the User Datagram Protocol (UDP); to application-level software operating 
over the Transmission Control Protocol (TCP). 

5. Case study (§ 6): we further demonstrate the use of timeouts and undi- 
rected branching to model a Domain Name System (DNS) distributed pro- 
tocol with a cache and in-built load-balancer; we also show the properties it 
satisfies, including safety and deadlock freedom. Further examples are avail- 
able in the technical report [23], including a prototype specification of a 
leader election algorithm used by consensus protocols. 


2 MAGz7: Language 


We present a multiparty session 7-calculus, based on the theory of Scalas and 
Yoshida [33], extended to accurately model real-world distributed network envi- 
ronments. We assume the lowest level of abstraction—the only failure detection 
mechanism available to a process is an upper-bound wait limit, i.e., a timeout. 
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e:= 2« | sip] (variable, session w/ role) 
d:= v|c (basic value, variable, session w/ role) 
w::= v | sip] (basic value, session w/ role) 
P,Q ::= 0 | (vs)P (inaction, restriction) 
P|Q | P+Q (composition, non-deterministic choice) 

cO [q]!m(d).P (selection towards role q) 

c &icr{lq:i]? m; (x;). Pi} (reliable branching from roles q;) 
c&icr{lq:]? mi(ai).P;, ©.Q} (branching from roles q; w/ timeout Q) 
dfDinP | X(d) (process definition, process call) 

8:0 (session buffer) 
D:= X(&%)=P (process declaration) 
o ::= (pè q<am(w))-o | € (session queue, empty session queue) 


Fig. 1. Syntax for MAGT 


Our calculus presents three novel features: (i) the new timeout primitive; 
(ii) the capability of expecting a message from different senders; and (iii) oper- 
ational semantics which can model various non-Byzantine failures. Timeouts can 
be attached to receive actions—henceforth referred to as branches—and are used 
to describe an alternative process to be executed in case failures are assumed to 
occur (akin to error handlers). 

Failures are said to be assumed, as opposed to detected, since we model the 
impossibility result of distinguishing between a delayed vs lost message. Thus, 
it is possible for a processes to prematurely timeout without its corresponding 
message having been lost—just like the real-world! 

The benefit of our approach is that the failure detection mechanism is agnos- 
tic to the type of fault, allowing us to model in unison message loss, message 
delay, crash-stop failures, link failures, and network partitions. 


2.1 Syntax 


Definition 1 (Language Syntax). The Multiparty, Asynchronous and Gen- 
eralised m-calculus syntax is defined by the grammar in fig. 1. 


Communication happens over sessions (s, s’) between a number of roles (p, q) 
ranging over set p. The primitives of the calculus are sessions with roles s|p], 
and basic values v, both of which can be abstracted using variables (x,y). Pro- 
cesses (P, Q), include the following standard constructs: (i) inaction 0 repre- 
sents process termination; (ii) session restriction (vs) P binds a new session s 
in P; (iii) parallel composition declares two concurrent processes; (iv) selection 
c®[q]!m(d). P uses channel c to send a message to q with label m and payload 
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[R-] s[p]@[q]!m(w).P | sic — P | s:o- (pp qdm(w)) -€ 


[R- &] s[q] &ier{[pi] ? mi(xi).P; [, ©. Q]} | s:(pr>aamy(w)) - o 
— P,|"/s,| | so fork eI 


[R-O] sla] &:er{[pi] ? mi(ai).P;, ©. Q} | sic — Q | s:0 
[R-+] P\+P, — P; for i € {1,2} 


[R-X] def X(z1,... £n) = P in (X (w1,... Wn) | Q) 
— def X(z1,... £n) = P in (P/a) e] | Q) 


[R-C] P > P’ = CP] > C[P" 


[R-l] sih- o — s:0 


Fig. 2. Reduction rules for MAGr 


d—after sending, the process continues according to P; (v) definition and dec- 
laration allow processes to be assigned names, modelling recursion through the 
use of process calls. We now elaborate on the novelties in our language. 


c&er{|q,]?m;(d).P} Quantification over roles in a branch allows processes 
to receive from one in a range of other roles. This has practical applications 
in a multitude of distributed protocols, e.g. load balancers. 

c &ier{[qgi] ?m;(d).P, ©.Q} Timeouts are used as a failure detection mecha- 
nism in receive branches. If a failure is assumed to have lost or prevented the 
incoming message, then process Q is initiated. It is key to note that time- 
outs are non-deterministic—they model an arbitrary and unknown duration 
of time a process waits before assuming a failure has occurred. 

P+Q _ Non-deterministic choice randomly picks between two possible process 
continuations. We use this construct to simplify examples for better pre- 
sentation. Concretely, it replaces the need for expressions and if-then-else 
constructs, which are routine and orthogonal to our formulation. 

8:0 Message buffer for session s. An entry in the buffer (p> q<m(w)) models 
a message “in transit” from role p to q with label m and payload w. This is 
needed to accommodate asynchrony in our language. 


2.2 Operational Semantics 
We begin with definitions of a reduction context and buffer congruence. 


Definition 2 (Reduction Context). A reduction context C abstracts away 
an outer environment from a process, and is given by: 


C x= C|P | (vs)C | def DinC | [] 


Hence, C[P] refers to process P under some arbitrary context C. 
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Definition 3 (Buffer Congruence). A process containing only a buffer un- 
der its restriction is congruent to inaction. Message buffers observe total reorder- 
ing. 


(vs)s:0 = 0 8:0, ` hi -ha 02 = 8:01 ha- hi- 02 


Definition 4 (OS). The operational semantics for MAG is given via a reduc- 
tion relation —> inductively defined in fig. 2, together with standard structural 
congruence rules /33/ and two buffer congruence rules defined in def. 3. 


Let us now comment on the reduction rules (fig. 2). Processes send messages 
using the selection rule [R-Q]; this adds the sent message as a new entry in the 
session buffer, and advances the process to its continuation. Sent messages are 
read from the buffer using the branching rule [R- &]. If the receiver has a valid 
branch matching the sender and message label, then it advances to the specific 
continuation of said branch (a timeout branch for this rule is optional). The 
substitution P;|"/,,] denotes the replacement of variable x, with the payload 
value w in the continuation process P;,. The timeout rule [R-©] advances 
processes to their timeout branch without changing the buffer. Non-deterministic 
choice is resolved using the choice rule [R-+], which advances the process to 
one of the two possible continuations. The call rule [R-X] replaces a process 
call with its defined process, substituting each parameter. Processes can reduce 
under a context using the context rule [R-C]. Lastly, messages can be lost 
from the buffer with the drop rule [R-J]. 

We now unpack how our semantics deals with failures. The reduction rules in 
fig. 2 allow various forms of failures to be modelled, stemming from the versatility 
and elegance of the drop rule [R-{]. The following elaborates on how this rule 
can be utilised to model different types of failure: 


— Message loss is modelled directly through the reduction rule [R-{]. 

— Crash-failure is modelled through repeated applications of [R-|] for a partic- 
ular role. E.g., to model a crash of role p, the reduction step [R-|] should be 
applied to all messages that enter the buffer matching the pattern (p> _<_(_)) 
(- symbolises a “don’t care” value). 

— Link-failure is modelled using a similar method; the difference being that 
messages between two specific recipients are dropped. E.g., modelling a link- 
failure between roles p and q requires [R-|] to be applied to all messages 
entering the buffer with the patterns (p> q<_(_)) and (q>p<_(_)). 

— Message delay is modelled by applying rule [R-©] to a branch whilst a valid 
message resides in the buffer. E.g.: 


sq] &ier{ [Pi] ? m: (x:)-Pi, ©.Q} | s:(Pe>qdme(w))-o 
— Q | s: (P> q<mg(w)): o fork € I. 


— Total message reordering is modelled via buffer congruence rules (def. 3). 
— Network partitions can be represented using multiple link failures. 
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The granularity at which we model failures allows for degrees of customi- 
sation. E.g., benign fault-tolerant consensus algorithms typically assume the 
possibility of all non-Byzantine faults, therefore all the aforementioned failures 
are required. Alternatively, an application assumed to run over a trusted TCP 
network need not worry about single message drops, and hence [R-J|] should 
only be applied to model node crash and link failures. 


Definition 5 (Well-formedness). To ensure that communication is possible, 
we require that a well-formed process has a buffer for each session, i.e., 


=(vs)Q => Q= (vs!) (Q' | 8:0) 


Def. 5 introduces a well-formedness condition to guarantee that a session 
always guards its buffer, hence ensuring that messages always have a queue to 
be placed in. From now on, we will only consider well-formed processes. 

Before concluding this section, we recall our ping pong running example from 
the introduction, and present below the processes for roles p, q and r. 


Example 2 (Ping Pong: Processes). 


[a] ? pong().Pe*, 


P, = s[p] ® [q] ! ping(). s[p] & v apl@ (el EE i [a] ? o nee 


Pek = slp] © [r]1ok0.0 P; = s[p] & one ae 


[p] ? ping) Pa", 
Py = s[q] & ping(). Pa”, 


[p]? 
” saed D sla] &{[p] ? ping()-P?™, ©. 0} 
PP°"9 = sla] @ [p]! pong(). 0 


= s|r] &{[p] ? 0k().0, [p] ? ko().0} 


3 MAGr: Type System 


We introduce the type system for MAGz, which is a conservative extension of 
the generalised asynchronous MPST theory [33, sec. 7]. Generalised MPST stray 
away from global protocol specifications (global types) and instead operate on 
user-defined localised specifications of each participating role (local types). The 
benefits of working with such theory include: (i) the ability to capture a larger 
set of viable protocols compared to traditional syntactic methods (e.g. global 
types) of enforcing consistent communication; (ii) the ability to model proto- 
cols of different requirements. In particular, instead of syntactically enforcing 
programmers to write, e.g., deadlock-free code, a generalised theory allows pro- 
grammers to unrestrictedly design protocols that are checked a posteriori against 
any number of required properties, such as deadlock-freedom, termination etc. 
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Basic and Session Types Buffer Types 
T:=B|S M ::= p!m(T)-M | € 
B ::= int | bool | real | unit | --- 
S:= &er{p; ?m,(T;).S;[, ©. S]} Session-Buffer Types 
| Bier{pi! ms (T:).S;} 7T2=M|S|(M;S) 
| pt.S | t | end 


Fig. 3. Basic Types, Session Types, Buffer Types and Session-Buffer Types 


M=M S=%S' 
T=T Mı : Mə = Moe - Mı €-€=E (M; S) = (M’; S’) 


Fig. 4. Type congruence rules 


The novelties of our type system include: (i) undirected branching/selection; 
(ii) timeout branches (syntax in § 3.1); and (iii) reliability sets—sets of roles 
assumed to not fail, from the perspective of each role (§ 3.2). Reliability sets 
(possibly empty) enforce the use of timeouts for all failure-prone communication. 

As in [33], our type system does not use global types, but solely relies on 
local types. Consequently, typing contexts must obey a safety property to ensure 
subject reduction (§ 3.3). Finally, we present the rules for our type system in 
§ 3.4, and discuss its key properties in § 4. 


3.1 Types 


Our MPST theory is designed for the distributed computing setting. Concretely, 
our type system (def. 6) is asynchronous; it allows branching (resp. selection) 
from (resp. to) multiple roles; and supports timeout continuation types. 


Definition 6 (Typing syntax). The typing syntaz is defined using the gram- 
mar in fig. 3. For undirected branching and selection, I #9 and role-label tuples 
(p;,m;) must be pairwise distinct. Recursion variables cannot be free and must 
appear guarded under branching/selection types. 


Type T denotes either a basic type B, or a session type S, and is used to type 
variables. Session types describe how a channel should be used: (i) undirected 
branching (external choice) &er{p; ? mi(T;).S;[, ©. S’]} denotes receiving a mes- 
sage with label m; and payload of type T; from role p;, then continuing accord- 
ing to S;. The (optional) timeout continuation type S” describes the protocol 
for handling failure on that branch; (ii) undirected selection (internal choice) 
@ier{pi!m;(T;).S;} denotes sending a message with label m; and payload T; 
to role p;, then continuing according to S;; (iii) type end marks a channel as 
closed, and terminates communication. A session buffer is typed using the buffer 
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type M. Entries in the buffer must correspond to the type p! m(T)-M, denoting 
a message sent to p with label m and payload of type T. A session with role is 
typed using session-buffer types, combining a session type and a buffer type. 

Type congruence = is defined in fig. 4. Notably, buffer types can be re- 
ordered, and two session-buffer types are congruent if their individual buffer and 
session types are congruent. Buffer type reordering is necessary to match the 
total message reordering supported by the language (def. 3). 


3.2 Reliability 


We go on a short detour and talk about reliability. Previous related work [4,1,38] 
have included the notion of reliability into their type systems. Generally, either 
one specific role, or a pre-defined set of roles, are assumed to be reliable—i.e., 
no failures occur for communication involving the identified set of roles. 

Our definition of reliability (def. 7) is the most general and the first to take 
into account the viewpoint of each role. We argue that this is necessary in a 
distributed setting since reliability in networks is dependant on the physical 
topology of processes. Recalling the ping utility (example 1), we could imagine 
the processes representing roles p and r reside on the same physical hardware, 
thus their communication cannot be affected by network faults; and the process 
for q resides on geographically separated hardware, therefore its communication 
with both p and r is vulnerable to failure. 


Definition 7 (Reliability). The reliability set A for a role p € p is defined 
as R C p \ {p}, capturing the viewpoint of p. Reliability R is defined as a 
function mapping roles to their reliability set, i.e., R : po HR. 


To better model real distributed environments, our definition of reliability 
allows each role to have its own (possibly empty) reliability set. 


Example 3 (Ping Pong: Reliability Sets). W.r.t. example 1, as the three roles 
have different viewpoints on each other, then the reliability set for each of them 
is different. In particular, we have R(p) = {r}, R(r) = {p}, R(q) = 9. 


Investigating the extremes, we have: for a set of roles p, if for all p € p-R(p) = 9, 
then no communication is reliable; conversely, if for all p € p-R(p) = p \ {p}, 
then all communication is reliable—referred to as a reliable network. This work 
only considers static configurations for R, thus reliability sets cannot change 
at runtime. We find that even with this restriction, our definition is the most 
general compared to related work. 


3.3 Contexts 


Definition 8 (Type contexts). Context O is a partial mapping from process 
variables to n-tuples of types and context I’ is a partial mapping from variables 
to types, and sessions with roles to session-buffer types, both defined below: 


O:= 0 | O, XTi Ta r:= 0 | r,z:T | T,sip]:T 
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The composition of contexts (I\, I>) is defined iff: 
Ye € dom(Ii1) Ndom(I>): Iil) =M A Ij(c)=S 


For such c, (1,I>)(c) = (M;S). 
Contexts are congruent I; = I> iff: 


dom(I) = dom(Z2) A Ve € dom(I}) : Li (c) = [o(c) 


Context O is non-linear and types process variables by tracking the types of 
their parameters. Context J’ is linear and allows variables to have basic or ses- 
sion types, and sessions with roles to have session-buffer types; as a program 
progresses, a role may simultaneously have both an active session type and mes- 
sages queued in the message buffer. 

Context composition allows two contexts to coexist as long as their common 
channels map to buffer types in one context, and session types in the other. 

Context congruence holds if two contexts have the same domain and the 
types of their channels are congruent. It is key to note that by the definitions of 
context composition and congruence we have s|p] : (M; S) = s[p] : M, s[p] : S. 
Buffer types (resp. session-buffer types) are only used internally by the type 
system; end-users are not expected to explicitly define these types. 


Definition 9 (Context reduction). An action a is given as: 
a = s[p]!q:m(T) | s[pl[g]:m | s[p]© 


From left to right, this reads as (i) a sent message; (ii) communication of 
a message; and (iii) the timeout of a channel. Context transition ®(y;r) is 
defined in fig. 5. We write I’ INSR) iff AI’: T DASR) I”. We define two 
context reductions > (5.pR) and >s as: 


J! > (E;R) I” holds iff I B SR) T 
T >s I" holds iff T “+s T" for a € {s[p]!q:m(T), s[p][q] : m} 


We write (br) (resp. >}) and is:ry (resp. +3) for their transitive and 
reflexive/transitive closures respectively. 


A context I` keeps track of open buffers using a buffer-tracker X. Whenever 
a new session is initialised, it is added to 2’, details in § 3.4 item [T-v]. For now it 
suffices to know that buffer trackers restrict communication to occur only over 
restricted sessions, thus by def. 5 (well-formedness), it guarantees that a session 
buffer exists for all sessions in X. 

Context reduction (def. 9) models communication at the type-level. Context 
I’ can reduce by sending, communicating, or timing out. By [r-c), I = sfp] : 
&ier{qi ?m;(T;).S;, ©.S’} can reduce to a timeout branch continuation type S’ 
if s is in the buffer-tracker (i.e., a buffer exists for session s), and at least one 
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[Pr-O 
sEx IJkEI:qrk €R(p) 


© 
s[p] : &ier{qi ? m; (T;).S;, ©. st aes; R) s[p] : s 


[P’-Sndi] 
sew kel 
).Si} sip] !axn:mp (Tx) 


s[p] : Bier{ai!mi(T; (3;R) SIP] : (ax !me (Tk): €; Sz) 


[T -Snd2] 
see kel 


s mz, (Tre 
s[p] : (M; ®ierfai !m;(T;).S:}) SOS an s[p] : (M -qk !me (Tr) - €; Sk) 


[T-Com] 
sew Jk EI: (p,m, T) = (Pk; mk, Tk) 


s[p] : q!m(T)-M, s[a] : &ier{p: ? m; (T:)-S; LO. S} EEES 5 gy slp] : M, sfa] : Sx 


[T-y] [L -Cong] 
slp] : SS/] Sear) T Di SSR) [2 
s[p] : ut.S Sis;r) I” DD &isr) D, T 


Fig. 5. Context reduction rules 


of the roles in the branch is unreliable. The latter prevents taking a timeout 
for communication that is sure to be delivered. Reductions [/-Snd,] and [T-Sndə] 
simulate sending a message by reducing the selection type ®ic7{q; ! m;(T;).S;} 
to one of its continuations S;, and by inserting the sent message into the buffer 
type. The difference is that [r-Snd,] creates the buffer type if it was previously 
not specified, whereas [/-Snd2] appends the message to an already existing buffer 
type. Communication between two roles is simulated through [1-Com], where a 
branch type s[q] : &er{p; ?m;(T;).S; [,©.S’]} consumes the message from a 
buffer type s[p] : q!m(T) - M, reducing to the continuations s[p] : M, s[q] : Sx. 
Lastly, [r-»] allows reduction through recursion and [r’-Cong] reduces substructures 
of compatibly composed contexts. 


Definition 10. Property ys is a (X; R)-safety property on typing contexts iff: 


S-R1] ysl, S &icr{qi ? m,;(T;)-S;}) = Wel: q; € R(p) 
S-Ro] sl’, s[p]: &ier{q:? m:(T;)-S;, O©.S}) = Hic: qi g R(p) 
S-Com] ysl, Ss &ier {qi ?m,(T i)S eh ©. SJ}, s[q l; M) 
and M = p!m(T) - M’ 
and Jkell:qk =q^mk =m T, =T 
su) pl, sip]: pt) = p(T, slp] : Sl“8/,) 
s+] ys(I’) and I >isr) I’ = p(T”) 


pl : 
p] : 


© 
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As previously mentioned, our type system is a generic one that does not use 
syntactic methods of enforcing consistent communication. Therefore, we define 
a safety property in def. 10 on type contexts that is used to guarantee subject 
reduction and other theorems (presented in § 4). 

We say ys is the largest safety property required to guarantee subject re- 
duction. The property can be re-instantiated with more specific conditions (as 
demonstrated in § 5) as per the requirements of the implementation. Concretely, 
[S-Rı] and [S-R»] ensure that timeouts are only not defined if communication is re- 
liable and that timeouts are defined if communication is unreliable respectively. 
Condition [S-Com] ensures that communicating messages have matching payload 
types. Lastly, [S-4] preserves ys through recursion unfolding and [s-—] requires 
safety to hold after context reduction. 


3.4 Typing Rules 


Our type system is defined by the typing rules in fig. 6. Below we explain them 
in detail. Typing judgements are of the form: O-/’ F P reading “process P is 
well typed under type contexts O and J”; and J) + d: T reading “value (or 
variable, or channel) d is of type T under type context J”. 


[T-0] The inaction process O is typed by a context that is “end typed”, deter- 
mined by the predicate end(/’)—defined in fig. 7. The predicate holds: (i) if 
I’ = @; (ii) if I consists of variables, then it holds if all the variables are 
either of a basic type, or can be typed by end; and (iii) if I’ consists of 
sessions with roles, then it holds if all the channels can be typed by end. 

T-Var] A variable or session with role c has type T in a context only containing 
the mapping of c to T. 
T-Val] A value v is typed by a basic type B if v is contained in the set of that 
basic type. E.g., 42 : N is typed under an empty context Ý since 42 € N. 
T-X] A process variable X is typed to an n-tuple of types T,,...,T, under 
context O, if O maps the process variable to the same n-tuple of types. 
T-e] The selection process c @ [qx] !mz,(d).P is typed under a context which 
maps the sending channel c to a selection session type ®ier{q; ! m;(T;).S;}, 
where a selection option matches the send process, i.e., k € I. The context 
should match the payload d to the type indicated in the selection (T), and 
continuation process P should be typed under the continuation type Sx. 

[T-&] The branching process c&jer{[pi] ? m;(x;).P;} is typed under a context 
which maps the receiving channel c to a branch type &jer{p; ? m;(T;).S;}, 
where all roles and message labels of each branch match. Every continuation 
process P; must be typed under the continuation type S; and payload typed 
by T;. If the process is a timeout branch c &e7{|p;] ? m;i(a;).P;, ©. Q}, then 
it should be typed by a session type also containing a timeout continuation 
&ier{p; ?m;(T;).S;, ©. S}, and the timeout process Q must be typed by Y. 

[T-Call] A process call X(d,,...,dn) is correctly typed if O types the process 
variable to a n-tuple of types T1ı,..., Tn and I’ maps each parameter d; to 
the corresponding T; (for i € 1..n). Any remaining channels in 7’ cannot be 
open, and hence must be end typed. 
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[T-0] [T-Var] [T-Val] [T-X] 

end(T) veB O(X) =T1,...,Tn 
Orko c:THc:T 0+ v:B OF X:T,...,Tr 
[T-¢] 


Ty H c: Oier{qi!m;(T:).Si} kel T3 je d: Tk 


O-T,e:S, F P 
Cal, Py ts H cO [qz]! mkd}. P 


[T- &] 


I” H c: &er{pi? mi(T:)-S; [, ©. S']} 
[O-T,e:S! E Q] YicI-O-T,zi:Tic:S$i + P; 


[T-Call] 
Ot X:Ti,...,Tr  end(I’) Wielin- T; F di:Ti 
O-Ii,..., In, I" F Xldi,...,dn) 


[T-Def] 
OX Trun Beni Tienaa P OX: Tres her E Q 
O-T F def X(z1:T1,...,2n:Tn)=Pin@ 
(T--+] [T-Lift] [T-e] 
Orth orth O-rtP ge(I’) 
O-r + Pit+Pe O-I Fg P O-T Fis} sie 
[T-o1] 
O-I" Fisy 8:0 CF w:T 
2-T,I",s[p]: q!m(T)-€ Fis} 8: (pp qdm(w))-o 
[T-o2] 
@-I",s[p]: M Fis} 8:0 rbw:T 
@-T,I",s[p]: q!m(T)-M Fy.) s: (p> q<m(w))-o 
[T-ow] 
r = (Ib = Ti), I> Ə- Ti Fy SiO gc(Io, I2) 
O-I Fs s:o 
[T-|] 
-D Fs A 0- Fs Bh Manta =0 
O-I\,12 Fs,us, Pi | Pe 
[T-v] 


I ={slpl: theo sgr HSJ; R)ys(I") O-F,I' te P 
O-T Faqs} (vs:l")P 


Fig. 6. Typing rules 
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Vice l.n - basic(T;) V zxi:T; F zx; : end 


end (0) end(x : T1,..., £n : Tn) 
end(/\) end(/>) Vi € l.n,pE p - s;[p]:7; H s;[p]:end 
end(J\, I>) end(s;[p] : T1,- --, silp] : Tn) 


Fig. 7. Predicate end(J’) 


gce(/") basic(T) gc(I’, s[p] : M) 
P=) ecl, [pl =) ge(I"ss[p] 4! m(T) - M) 


r=TI",s'[p']:T gc(I”, s[p] : M) 
gc(I’, s[p] : q!m(T) - M) 


Fig. 8. The garbage collector predicate gc(J’) 


s[p]: q!m(T)-e~ T, s[p]: M = T, s[p]: q!m(T)-M 
slp] :q!m(T) -€ ~ I’ when s[p]: M¢ r = T, s[p] : q! m(T)- € 


Fig. 9. Message insertion function J’ I" 


[T-Def] Process declaration X (x1 : T1, ..., £n : Tn) = P is well typed if P is self- 
contained, i.e., contexts containing the types of the declaration parameters 
(along with any previous ©) should type P. Process definition def X(a1 : 
Ti,.--,%n: Tn) = P in Q is typed under O-T if its declaration is well typed 
and Q is typed under J’ and © composed with the new process variable. 

[T-+] Non-deterministic choice is well typed if processes are typed by O- J in 
isolation. This is in line with how case or if-then-else processes are typed. 

[T-Lift] We annotate the typing judgement O- J’ F P with the buffer-tracker 
to obtain O- I’ Fs P, denoting that the sessions in X occur in P. The 
lifting rule annotates the typing judgement with an empty buffer-tracker if 
the buffer-less judgement (+) types P (using the rules mentioned thus far). 

[T-e] In standard asynchronous MPST theory, the empty buffer s : € is typed 
under the empty context 0, ensuring a one-to-one correlation between buffer 
types in the context and messages in a session buffer. However, since our 
calculus models message loss, it is possible that a context contains buffer 
types for messages that have been dropped from the process buffer. Thus, 
our theory types s : e under a garbage collected I’. The predicate gc is 
defined in fig. 8, and states that valid leftover types gc(J/’) are: (i) empty; 
(ii) empty buffer types; (iii) message buffer types with basic-type payloads; 
or (iv) message buffer types with channel payloads that are typed under I’. 
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[T-o1] [T-o2] An entry in a session buffer s : (p> q<dm(w))-o is typed under a 
context containing a mapping from s[p] to a buffer type q! m(T)-M, matching 
the recipient and message label. The message payload w must be of type T, 
indicated by the buffer type, and buffer continuation s : ø should be typed 
under the buffer continuation type M in the case that it is not empty ([T-2]). 

[T-cw] Weakening allows a session buffer to be typed under a larger context if the 
addition can be garbage collected and inserted into the original context using 
the message insertion function (fig. 9). This is partial function that either 
appends a message to an existing buffer type, or inserts it as the head of a 
new buffer type. Put differently, weakening allows a buffer to be typed under 
a larger context containing irrelevant types that can be garbage collected. 

[T-I] If a process P, is typed by J, and process P> is typed by J >, then the 
composition I,/> types the parallel composition P) | P2. It is also required 
that parallel processes cannot each contain a buffer for the same session s. 
This guarantees the uniqueness of one session buffer per restricted session. 

[T-v] Session restriction (vs:I”) P requires session s to be instantiated with a I” 
mapping each session with role to its session-buffer type. y,(/”) must hold 
to ensure subject reduction, as discussed in § 3.3. Session s should not be 
present in a previous context J’, and process P should be typed under the 
composition of the previous and newly instantiated context with the updated 
buffer-tracker O - I’, I” ty P (since the buffer for s is contained in P). 


Example 4 (Ping Pong: Type Context). Recalling the ping pong example, the 
whole system can then be described by a parallel composition of the three pro- 
cesses representing each role p, q, r together with an empty buffer, which is 
closed under a type context J’ with the following typing assumptions. 


T= {s[p] : Sp, sla] : Sa, s[r] : Sr} 


Pima = (w8 T) Po | Py | Pe | sie 


4 Type Properties 


The main results of our MPST system for MAGz processes are subject reduction 
(theorem 1) and session fidelity (theorem 2). It is key to note that our results 
are parametric on the reliability function R. Thus, the theorems we present hold 
for any configuration of reliability, i.e., from no reliable communication all the 
way to completely reliable networks. 

In order to synchronise reliability assumptions between types and processes, 
we define the reliable process reduction —>R, such that —>xr C —. 


Definition 11 (Reliable process reduction). The reliable process reduction 
— +R is inductively defined by the same reduction rules for — (in fig. 2), with 
the following changes °: 


[R-©] — s[q] &ier{[pi] ? mi(zi).Pi,O.Q}|s:0—RQ|s:o0 ifaAkeT: pr Rq) 


[R-J] s : (p q<ım[w)): o >R 8:0 for q Z R(p) 
3 For a fully unreliable network, i.e., Vp € p- R(p) = 0, — xr is equivalent to —. 
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Intuitively, the reliable process reduction disregards network faults for reliable 
communication. Concretely, a timeout reduction [R-©] is only possible if at least 
one role in the branch is unreliable; and message loss [R-|] can only occur for 
messages that are not reliable from the viewpoint of the sender. This ensures 
that no messages are ignored or lost for reliable communication. Proofs of our 
theorems, along with any auxiliary results, are given in the technical report [23]. 


4.1 Subject Reduction 


Using —>R, we now present our result of subject reduction. Intuitively, subject 
reduction states that, if a process P is typed under a safe context, and P reliably 
reduces to some process P’, then the context also reduces (in 0 or 1 steps) to a 
safe context, which types the new process P’. 


Theorem 1 (Subject Reduction). 


O-I ts P and (2;R)-y,([) and Par P => 


T: Date) and (I; R)-p(I") and O-I' by P 


WwW 


A key novel result of our type system is that no unexpected network failures 
can occur at runtime, i.e., a process always has a failure-handling subprotocol 
defined for unreliable communication. This follows from the definition of our 
safety property Ys (def. 10) and holds through subject reduction. We state the 
result in cor. 1. More precisely, this corollary states that timeout branches are 
guaranteed to be defined for unreliable communication. The inverse is stated in 
cor. 2, i.e., timeouts are not defined for branches containing only reliable sources. 


Corollary 1 (Failure handling safety). Given a reliability function R : 
p¢R(q) andO-I Fs P with (2;R)-ys(L) and P 2 P' = C[Q] implies 
Q £ sla] &ierf{...,[p] ?m(z).Q’}. Le., Q cannot be a branch at q receiving from 
p and not define a timeout. 


Corollary 2 (Reliability adherence). Given a reliability function R : 
R(q) = Rq and O-I ty P with (X;R)-ys(L) and P 2 P' = C[Q] implies 
Q Æ siq] Sier{[pi] ? mi(zi).Q;, ©. Q’} st: Vi € I : pi E Rq. Le., Q cannot be a 
branch at q only receiving from reliable roles p; and define a timeout. 


4.2 Session Fidelity 


Session fidelity states the opposite implication of subject reduction, i.e., if I 
types a process P, and J’ can reduce, then P can match at least one of the 
context reductions. 

Consequently, relevant properties of process P can be deduced from the be- 
haviour of its type context I’ (as we will see in theorem 3). However, as shown 
by Scalas and Yoshida [33, sec. 5.2], the result does not hold for all well-typed 
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processes. Concretely, session fidelity is violated by: (i) processes that recurse 
infinitely without being productive (e.g. def X(x) = X(x) in X(s[p])); and 
(ii) processes that deadlock by interleaving communication across multiparty 
sessions. Hence, we assume the necessary conditions on processes to restrict the 
aforementioned violations, by adapting [33, def. 5.3]. 


Definition 12 (Conditions for session fidelity). Assuming -I Fy. P. 
We say that P: 


1. has guarded definitions iff each process definition in P of the form 
def X (xı: T,... zn: T) =Q in P 


Vj € 1..n : if T; is a session type, then a process call Y(...,%j,...) can 
only occur in Q as a subterm of 


Tj &ier{[pi] ? m:(y:)-Pil, ©. PJ} or xj @ [p]!m(y).P”, 
i.e., after x; is used for input or output. 
2. only plays role p ins, by I iff: (i) P has guarded definitions (from 1); 


(it) f£v(P) = 0; (iii) T = Io, s[p] : T with T # end and end(Iy); and 
(iv) for all (vs':1") P! subterm of P, end(I”). 


We say “P only plays role p in s” iff SI’: 0-I’ Fis} P and condition 2 holds. 


Def. 12 formalises guarded recursion in condition 1, and the notion of only 
playing a single role for a given session in condition 2. Together, these conditions 
ensure that session fidelity, stated in theorem 2, holds for all well-typed processes. 


Theorem 2 (Session Fidelity). Assuming ý- Fs P with (©;R)-y,(1), 
P = (Iper Py)|s: o and I = User Ip, and for each Py: (i) 0-I, Fx Pp, and 
(ii) P, being O (up-to-=) or only plays role p in s, by Ip. Then, 

I —5;r) implies Al", P’: (i) l —(5,r) I", (tt) PRP", (ti) OT" Fy P' 
with (L;R)-ps(I"), (iv) P! = (Iper Pi) |s : o and I’ = Use; Ip: and (v) for 


each Pi: 0-T, Fx Pj, and P! is O (up-to-=) or only plays role p in s, by Ij. 


4.3 Process Properties 


Our result of session fidelity (§ 4.2) allows us to infer runtime properties about 
programs in MAGzr from their types. We proceed by defining desirable runtime 
properties on processes (def. 13); expressing the equivalence of these properties 
at type-level (def. 14); and presenting our result of process properties verification 
(theorem 3), linking process properties to their type-level equivalences. 

From def. 13 below, a process is: (i) R#-communication-safe (new w.r.t.[33]) 
if it reaches the end of communication over reliable reductions and has no leftover 
messages in the buffer; (ii) deadlock-free if it either reduces or it is inaction; 
(iii) terminating if it is deadlock free and can reach inaction in a finite number 
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of steps; (iv) never-terminating if it can always infinitely reduce; and (v) live 
if, for every reliable branch it can reduce to, it can eventually reduce to some 
branch continuation. We need not consider branches with timeouts since these 
are trivially live, given that a process can always reduce over the timeout. 


Definition 13 (Process properties, adapted from [33]). For some reliability 
function R, and full reliability function Rr, a process P is said to be: 


(i) Rr-communication-safe iff 
P—-, P' Hr, and P’=C[s:o] implies o =€; 


(ii) deadlock-free iff P — P' Hor implies P' = 0; 
(iti) terminating iff it is deadlock free, and 


WwW 


i finite st: Yn > i : P= Po RR Pi —R-+++ SR Pan implies Pa fOr; 


(iv) never-terminating iff P—} P' implies P! —R; 
(v) live iff P —R P' =C[Q] implies 


if Q=c&ier{[ai] ? mi(a,).Qi}, then 
IC, kE Iw: P 3% CIQ Ja]. 


Note that, differently from other works [4,33], our definition of liveness only 
speaks about receiving processes, and not sending. Typically, liveness also re- 
quires that a sent message—in the case of MAGr, any message in a session 
buffer—is always eventually consumed. However, because of the failures that 
our calculus models, it is possible that a process is live and still have uncon- 
sumed messages in the buffer (e.g., as a result of timing out due to a message 
delay). Additionally, for a R p-communication-safe process it follows that all sent 
messages are consumed in the reliable case. Hence, the traditional definition of 
liveness still holds for reliable network configurations, and our new definition 
provides the largest guarantees possible given the failure assumptions. 

We now present the type-level equivalences of the above process properties. 
For liveness, we generalise to the largest liveness property, as done with safety in 
def. 10, allowing users to define more fine-grained notions of liveness, if required. 

From def. 14 below, a type context is: (i) Rr-communication-safe if it has 
no populated buffer types when it can no longer reliably reduce; (ii) deadlock- 
free if the reason why it can no longer reduce is because it is end typed (and 
possibly, as a result of network failures, has some leftover types that can be 
garbage collected); (iii) terminating if it is deadlock free and can reach the end 
of the protocol in a finite number of steps; (iv) never-terminating if it can always 
infinitely reduce; and (v) live if, for every reliable branch it can reduce to, there 
is a series of steps that can reduce to a continuation of that branch. 


Definition 14 (Type context properties). For some reliability function R, 
a full reliability function Rp, and a set of sessions X, we say contest I is: 
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(i) (2;Rr)-communication-safe iff 
T — s.r ,) I’ Acs;r,) and s|p]:M € I” implies M = €; 
(it) (X; R)-deadlock-free iff 
I —1y.r) I fear) implies IX = I}, I" st: end(I4) and ge(I"); 


(itt) (X; R)-terminating iff it is (X; R)-deadlock-free, and 3i finite st: 


Vn >i:T =o —> (SyR) Ty —> (SR) Fae —> (SyR) Ta implies De F(R) H 


(iv) (X; R)-never-terminating if IT —+(s.p) 1" implies I" —+(5.R); 
(v) (X; R)-live iff it obeys some liveness property (X; R)-yı st: 


(X; R)-p aa s[p] : S) and S = &ier{qi? m;(T;).S;} 
J1", k € I: T, s|p] : S —¥s;r) I", slp] : Sx 
(X; R)-p A, , slp]: ut-S) = (25R)-e.(L, slp] : SHS) 


(Si R)-pL) and T >is) I” = (X; R)-pLlI) 


We are now ready to use these type-level equivalent properties to infer be- 
haviours of the processes they type. We present our result in theorem 3 which 
formally states that, under the same assumptions given in session fidelity (theo- 
rem 2), if a process is typed under some type context, and a property holds on 
that context, then the same property holds for the process itself. 


Theorem 3 (Process properties verification). Assuming: 0- I Fs P with 
(X R)-ps(L), P = Uper P) |s : o and F = Une; Ip. Further, for each Py: 

(i) -Ip Fs Py, and (ii) PP. =0 or P, only plays vole p ins, by Ip- Then, 
Vd € Rr- communication- a dadoa -free, terminating, never-terminating, 
live}, if (2;R)-o(L), then P is ¢. 


4.4 Decidability 


Since MAGz is Turing-complete, determining the properties listed in def. 13 
from processes is undecidable [5]. A benefit of our generalised theory is that un- 
decidable process properties can be inferred from decidable type-level properties. 


Theorem 4 (Decidability). If (X; R)-ọ(T`) is decidable, then “O-T ty P 
with (X; R)-o(L)” is decidable. 


Our decidability result (theorem 4) states that for any decidable type-level 
property, type-checking with that property is decidable. However, since MAGz 
is asynchronous, we have no results on decidability of ¢. On the contrary, as 
discussed in [33, sec. 7], type-level properties for asynchronous type theories are, 
in some cases, undecidable. This is a result of pairing buffer types with session 
types—which makes the type system Turing-powerful [3, thm. 2.5]. Scalas and 
Yoshida [33] address this issue through two methods: (i) standard global types 
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produce type contexts that can be captured through a decidable consistency 
property; and (ii) restricting the size of the message buffer to make proper- 
ties decidable. The former ensures decidability by restricting communication to 
match the expressivity of global types. For the latter, they show that any type 
context that remains bound within a finite-sized buffer is decidable (since the 
type has a finite state transition system representation). In line with their re- 
sults, we lift their definition of boundedness, i.e., a restriction on the size of a 
buffer, to MAGz’s type system. 


Definition 15 (Boundedness, from [33]). We say I’ is (7;R)-bound;, iff 


Jk EN: I —+is.py 1’, s[p] :M implies |M| < k. 
We say I is (X; R)-bounded iff 3k finite : (X; R)-bound; (I). 


Using def. 15, we present our result of decidable bounded properties in theorem 5. 


Theorem 5 (Decidable bounded properties). (X; R)-boundp(I’) is de- 
cidable for all X, R, and k. Furthermore, if (7;R)-bounded(I’), then Vb E€ 
{Rr-communication-safe, deadlock-free, terminating, never-terminating, live}, 


it holds that (X; R)-ọ(I`) is decidable. 


Thus, decidability is guaranteed for all protocols expressible through stan- 
dard asynchronous global type theory, and all protocols that use finite message 
buffers—now with the benefit of reasoning about and handling network errors! 


Example 5 (Ping Pong: Properties). Inspecting the types in example 1 and 
example 4, we can conclude that I’ = {s[p] : Sp,s[q] : Sg, s[r] : S,} is boundg. 
By theorem 5, J’ is decidable to check for type-level properties. On doing so, we 
determine that I’ is: (i) safe, it satisfies the safety property (def. 10) required 
for subject reduction; (ii) Rp-communication-safe, since if we only consider 
reliable reductions, no buffer types remain populated; (iii) terminating, since 
we can count the number of steps taken to reach the end of the protocol; and 
(iv) live, as reliable communication S, always reduces—i.e., a result is always 
obtained. 


5 Generalising Network Assumptions 


The work presented thus far covers worst-case network assumptions for commu- 
nication. As beneficial as this may be for low-level networks programming, and 
for complex distributed applications with minimal assumptions (e.g. consensus 
protocols), not all applications are built on these pessimistic conditions. E.g. 
many distributed applications operate over the Transmission Control Protocol 
(TCP), and thus assume that if consecutive messages are received from the same 
source, then they are guaranteed to arrive in the order in which they were sent. 

We now showcase the few changes to MAGza required to alter its network 
assumptions. It is key to note that these changes produce a subset of MAGz, 
thus all relevant properties continue to be valid for its TCP-compliant version. 
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5.1 From Total to Partial Reordering 


In a reliable network configuration designed to run over TCP, message reordering 
for communication between two parties is guaranteed to not occur. Therefore, 
we can adjust the message reordering of MAGz to model this environment, and 
strengthen our safety property Ys to TCP-safe communication. MAGz models 
message reordering through buffer congruence rules. Therefore, strengthening 
congruence suffices to restrict communication to the TCP-safe assumptions. 
Definition 16 (TCP process-congruence). The process congruence for the 
TCP-compliant subset of MAGT, =rcp, is inductively defined using the same 
rules defining = (in def. 3), but with the following change: 


8:01:hi-ho-o2 = 8:01:ho-hi-o2 
replaced by 
Pi Æ P2 or qi # q2 


5:01: (Pı > qı <4 Mı (w1)) - (P2 > q2 4 m2(w2)) + o2 
=Ærcp S:01- (p2 > q2 <m2(w2)) - (pı > qı < mı(w1)) - o2 


To obtain the TCP-compliant subset of MAGz, we assume reductions over 
fully reliable networks and adopt TCP process congruence from def. 16, which 
no longer allows reordering of messages for each role couple. We now reflect this 
definition of TCP congruence at the type-level in def. 17, and use this to define 
a TCP-safety property on type contexts in def. 18. 


Definition 17 (TCP type-congruence). The type congruence for the TCP- 
compliant subset of MAGT, =rcp, is inductively defined using the same rules as 
= (fig. 4), but with the following change: 


p#q 
replaced by 
Mi -Me = Me-Myi p!mi(Ti)-q!m2(T2)-M 


=r q!m2(T2)-p!mi(Ti)-M 
Definition 18 (TCP safety). Predicate yrcp is a Y-TCP-safety property 
on typing contexts iff: 
prell, s[p] : &ser{a: ?m,(T;)-S;}, s[q]:M) 
and M =rcp p!m(T) - M’ 
and Ske l:q,=q m=mA Tr =T 
prell, s[p] : ut-S) = prell’, sip] : SHS /4]) 
Proll) and I >y yet => Proll”) 


Similar to our previous definition of safety in def. 10, TCP safety ensures that 
payload types of communicating entities match. In addition, it also requires 
correct ordering of messages (up to =rcp) by checking message labels—this is 
possible since messages between two parties do not get reordered, and so they 
must be received in the same order they are sent. In order to benefit from the 
session theorems proved in § 4, all that is required is to show that yrcp C Ys, 
i.e., any context that is TCP-safe is also safe. This is the only requirement since 
all theorems in § 4 (i) are parametric on the reliability function R, including 
fully reliable networks; and (ii) are proven for (X; R)-ys (T`). 
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Proposition 1 (Containment of prop in ys). VI € prop : T € gs. 


Proof. rcp uses a fully reliable configuration of MAGza—i.e., is void of failure- 
handling timeouts—and thus trivially abides by [S-Rı] and [S-R2]. [S-] is reflected 
directly in yrep. [S-—>] is reflected for R = Rp, i.e., for a fully reliable configura- 
tion. [S-Com] is never violated by I’ € prep since =rcp C =. 


6 Case Study 


This work presents the Ping (examples 1-5) and Domain Name System (§ 6.1) 
examples as they are widely known, and between them cover the full range of 
our contributions. Previous related works are not expressive enough to model 
either protocol with our range of failure assumptions. Thus Ping and DNS are 
suitable to illustrate how MAGz pushes the boundaries of MPST. Additional 
examples are provided in the technical report [23]. 


6.1 DNS 


We now demonstrate the key features of MAGz through a case study. We 
present a multiparty example of a Domain Name System (DNS) with a cache 
and inbuilt load-balancer. This example: (i) reasons about failures in its unreli- 
able connections that are specified using our novel viewpoint-specific reliability 
sets; (ii) defines fatlure-handling protocols for these possible failures; (iii) is 
bounded (def. 15), and thus has decidable type-level properties; and (iv) is safe, 
Rr-communication-safe, deadlock-free, terminating, and live. Typically, DNS is 
implemented over TCP, however the distributed components can still suffer hard- 
ware failures. To cater for this, and for better demonstration of our contributions, 
we describe the protocol in our failure-prone setting. 


Specification We consider a specification of a client-DNS interaction, where 
the client consults a cache, and the DNS delegates requests to workers. 

The client, represented by role c, wishes to retrieve a web-address for a par- 
ticular URL, and can do so by issuing a request to the DNS. As an optimisation, 
the client also stores recently retrieved addresses in a local and reliable cache— 
thus before issuing new requests to the DNS, it first consults this cache. Upon 
receiving a request, the DNS offloads processing work to one of two workers, 
represented by roles w; and wə. After retrieving the appropriate address, the 
worker sends the response to the client. 

The reliability configuration of this application is as such: the client and 
cache have reliable connections, formally R(c) = {cache} and R(cache) = {c}; 
the DNS and workers have reliable connections, formally R(DNS) = {w,, w2} 
and R(w1) = R(w2) = {DNS}; all other communications are unreliable. 

We now present the session types specifying the communication protocol for 
this distributed application. We adopt shorthand notion for singleton selections, 
and omit payload types for simplicity, as with the ping example. 
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Example 6 (DNS protocol). 


cache ? ans().end, 
wi ? ans().cache ! new().end, 
cache ? 404().DNS! req(). & 4 we ? ans().cache ! new().end, 
©. cache! ko().end 


S. = cache! req(). & 


c!ans().end, 
Se EEE e i 


wi ! req().w2 ! ko().end 
ieee’ "0-2 tr Ireq().ws !ko().end 
©. wi ! ko().we ! ko().end 


S =& DNS? req().c! ans().end, 
~~ DNS ? ko().end 


Our viewpoint-specific definition of reliability is necessary to specify the re- 
liable connections with the DNS and workers whilst maintaining unreliable con- 
nections with the client. Additionally, the client type Se (resp. the DNS type 
Spns) is dependant on using undirected branching (resp. selection). Hence this 
example is not expressible using previous theory [4,33]. 


7 Related Work, Conclusions and Future Work 


Modelling failures has become a relevant and widely researched topic in recent 
years. We elaborate on how our generic type system and modular language differs 
from, and in some cases may possibly subsume, related work. 

Majumdar et al. [24] introduce undirected branching as a means of cater- 
ing for the non-deterministic partial reordering of messages that is possible in 
networks using the Transmission Control Protocol (TCP). As shown in § 5, the 
modularity of our type system allows MAGz to be adapted to support this 
network configuration, as well as other settings with lower levels of abstraction. 

Affine type systems define types that can be used at most once. Affine session 
types [25,12,6] use affine typing metatheory to allow sessions to be prematurely 
cancelled in the event of failure. These works only model application-level failure 
(using try/catch blocks) and do not necessarily describe how a failure is handled, 
but only allow the initial protocol to be abandoned if failure occurs. 

Viering et al. [38] present a MPST theory for event-driven distributed sys- 
tems, where processes are restarted by monitors if they crash. This approach 
requires a centralised reliable node, a notion that is subsumed by our view-point 
specific definition of reliability, def. 7. 

Chen et al. [8] remove the need for a centralised reliable node. They equip 
their type system with synchronisation points capable of detecting and handling 
failures raised by the nodes that experience them. Similarly, Adameit et al. [1] 
consider an environment free from a centralised reliable node where unstable 
links between participants can fail. They introduce the concept of optional blocks, 
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allowing default values to substitute data not received due to communication 
failure. Viering et al. [37], motivated by consensus algorithms, delegate a group 
of processes as a permanently available recovery system capable of monitoring 
processes and informing them of failures. Thus, they no longer rely on one cen- 
tralised robust node, but instead assume that at least some of the processes that 
make up the coordinator are alive at any given time. The drawback in these 
approaches is their reliance on coordination to handle faults. This may not be 
suitable with certain network configurations and failure-models. Since our type 
system handles failure through low-level techniques, it remains agnostic to the 
types of failures, and is suitable for any non-Byzantine network configuration. 

Recent work by Peters et al. [28] extends global type theory with failure 
annotations—marking communication susceptible to failures and the kind of 
failure (specifically either process crashes or message loss). They handle failure 
by defining default values and branches. Since the theory is an extension of global 
types, it suffers from the same problems that are addressed through generalised 
MPST. Additionally, the work is not agnostic to failure-models, and so it is 
uncertain if the theory is capable of model failures other than the two considered. 

Most similar to MAG is work by Barwell et al. [4], where generalised 
session type theory is extended to reason about crash-stop failures. They re- 
serve the crash message label, which can be used in receive branches to detect 
node failure and specify failure-handling subprotocols. In line with our research, 
their type system is generic, thus improving its expressiveness. However, unlike 
MAGz, their theory is not asynchronous, does not support undirected branch- 
ing/selection, and assumes crash-stops to be the only possible faults—we address 
and capture a range of failures such as crash failures, link failures, message loss, 
delays and reordering and network partitioning. 

Distributed variations of the z-calculus [2,30,7,13] introduce process loca- 
tions—representations of real-world physical hardware. Processes are assigned 
to locations to form a topology, and locations can be crashed to model failures. 
None of these calculi model the range of failures that are supported by MAGz, 
nor do they have type systems to ensure communication-safe failure handling. 

To conclude the paper, we presented MAGaz—a Multiparty, Asynchronous 
and Generalised a-calculus which addresses the widest set of non-Byzantine 
faults by using timeouts and the most general reliability definition. Our language 
builds on the generalised and asynchronous MPST, which is the most flexible 
for distributed programming. We prove subject reduction and session fidelity; a 
series of process properties, as well as fault-handling safety and reliability ad- 
herence. As future work, we aim to investigate linear logic for Curry-Howard 
correspondences in order to understand the foundational and canonical meaning 
of faults and reliability. We aim to investigate Byzantine faults in combination 
with the non-Byzantine faults addressed here. Lastly, we will explore the use of 
model checking to streamline the verification of process properties. 
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Abstract. We study increasingly expressive type systems, from F“—an 
extension of the polymorphic lambda calculus with equirecursive types— 
to Fé—the higher-order polymorphic lambda calculus with equirecur- 
sive types and context-free session types. Type equivalence is given by a 
standard bisimulation defined over a novel labelled transition system for 
types. Our system subsumes the contractive fragment of FY as studied 
in the literature. Decidability results for type equivalence of the various 
type languages are obtained from the translation of types into objects 
of an appropriate computational model: finite-state automata, simple 
grammars and deterministic pushdown automata. We show that type 
equivalence is decidable for a significant fragment of the type language. 
We further propose a message-passing, concurrent functional language 
equipped with the expressive type language and show that it enjoys 
preservation and absence of runtime errors for typable processes. 


Keywords: System F, Higher-order kinds, Context-free session types 


1 Introduction 


Extensions of the A-calculus to include increasingly sophisticated type struc- 
tures have been extensively studied and have led to systems whose importance 
is widely recognized: System F [60], System F” [30], System Fo [36], System 
F£ [14]. Ideally, we would like to combine a wishlist of type structures and get 
a super-powerful system with vast expressiveness. However, the expressiveness 
of types is naturally limited by the universe where they are supposed to live: 
programming languages. Expressive type systems pose challenges to compilers 
that other (less expressive) types do not even reveal; one such example is type 
equivalence checking. 

System F can be enriched with different type constructors for specifying 
communication protocols. We analyse the impact of combinations of such con- 
structors on the type equivalence problem. In order to do so, we extend System F 
with session types [42,43,67]. Session types provide for detailed protocol specifi- 
cations in the form of types. Traditional recursive session types are limited to tail 
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recursion, thus failing to capture all protocols whose traces cannot be character- 
ized by regular languages. Context-free session types overcome this limitation by 
extending types with a notion of sequential composition, T; U [2,68]. The set of 
types together with the ; binary operation constitutes a monoid, for which a new 
type, Skip, acts as the neutral element and End acts as an absorbing element. 


The regular recursive type pa: S.&{Done: End, More: ?Int; a} describes an 
integer stream as seen from the point of view of the consumer. It offers a choice 
between Done—after which the channel must be closed (as witnessed by type 
End)—and More—after which an integer value must be received, followed by the 
rest of the stream. Types are categorised by kinds, so that we know that the 
recursion variable a is of kind session—denoted by s—and, thus, can be used 
with semicolon. Instead, we might want to write a type with a more context-free 
flavour. The type wa: s. &{Leaf : Skip, Node : a; ?Int; a};End describes a proto- 
col for the type-safe streaming of integer trees on channels. The continuation 
to the Leaf option is Skip, where no communication occurs but the channel is 
still open for further composition. The continuation to the Node choice receives 
a left subtree, an integer at the root and a right subtree. In either case, once 
the whole tree is received, the channel must be closed, as witnessed by the final 
End. Beyond first-order context-free session types (where only basic types are 
exchanged) [2,68] we may be interested in higher-order session types capable of 
exchanging values of complex types [19]. A goal of this paper is the integration 
of higher-order context-free session types into system F#. We want to be able 
to abstract the type that is received on a tree channel, which is now possible by 
writing Aa: T.u 8: S. &{Leaf : Skip, Node : 8; ?a;8};End, where T is the kind of 
functional types. 


A form of abstraction over session types with general recursion was proposed 
by Das et al. [24,25] via (nested) parametric polymorphism. In the notation of 
Das et al., we can write a type equation for abstracting the type being received 
on a stream channel Stream(a) = &{Done: End, More: ?a;Stream(a)}. Using 
abstractions, we can write Stream as a function of its parameter a, Stream = 
Aa: T.&{Done: End, More: ?a; Stream a}; alternatively, we can use the p-operator 
to rewrite the Stream type as Aa: T.(j 8: s. &{Done: End, More: ?a.6}). Das 
et al. proved that parametrized type definitions over regular session types are 
strictly more expressive than context-free session types. To some extent, this 
analogy guides our approach: if adding abstraction (via parametric polymor- 
phism) to regular types leads to nested types, what exactly does it mean to add 
abstraction (via a type-level \-operator) to context-free types? Throughout this 
paper we analyse several increments to System FF that culminate in adding 
A-abstraction to context-free session types. 


One of our focuses is necessarily the analysis of the type equivalence problem. 
The uncertainty about the decidability of this problem over recursive parametric 
types goes back to the 1970s [16,63]. Although the type equivalence problem for 
parametric (nested) session types and context-free session types is decidable, 
that for the combination of abstractions over context-free types may no longer 
be. In fact, this analysis constitutes an interesting journey towards a better 
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understanding of the role of higher-order polymorphic recursion in presence of 
sequential composition, as well as the gains (and losses) resulting from combining 
abstraction with arbitrary (rather than tail) recursion. 


Ultimately, decidability is not a sufficiently valuable measure regarding a type 
system’s practicality. We look for type systems that may be incorporated into 
compilers. For that reason, we are interested in algorithms for type equivalence 
checking. Equivalence in F# alone is already at least as hard as equivalence of 
deterministic pushdown automata. If we restrict recursion to the monomorphic 
case (requiring recursion variables to denote proper types, that is of kind s or T, 
collectively denoted by *) we lower the complexity of type equivalence to that 
of equivalence for finite-state automata. The extension with context-free session 
types is slightly more complex. In order to obtain “good” algorithms, we restrict 
the recursion to the monomorphic case, arriving at classes Fi, FB+. Now the 
type equality problem for F#*: translates to the equivalence problem for simple 
grammars, which is still decidable [4,33]. Since F#* subsumes F#*, our proof 
of the decidability of type equivalence serves as an alternative to that of Cai et 
al. [14] (restricted to contractive types). 


Higher-order polymorphism allows for the definition of type operators and 
the internalisation of various (session-type) constructs that would otherwise be 
offered as built-in constructors. In this way, we are able to internalise basic 
session-type constructors such as sequential composition ; and the Dual type op- 
erator (which reverses the direction of communication between parties). Duality 
is often treated as an external macro. Gay et al. [34] explore different ways of 
handling the dual operator, all in a monomorphic setting. In the presence of 
polymorphism the dual operator cannot be fully eliminated without introducing 
co-variables. Internalisation offers a much cleaner solution. 


Due to the presence of sequential composition, regular trees are not a power- 
ful enough model for representing types (type TreeC a in Section 2 is an exam- 
ple). The main technical challenge when combining System F and context-free 
session types is making sure that the resulting model can still be represented by 
simple grammars, so that type equivalence may be decided by a practical algo- 
rithm. The difficulties arise with renaming bound variables. For infinite types, 
both renaming with fresh variables and using de Bruijn indices may create an infi- 
nite number of distinct variables, which makes the construction of a simple gram- 
mar simply impossible. For example, take the type àa: T.wy: T.AG: T.a > 4, 
which stands for the infinite type àa: T.AG: T.a > AB: T.a > AG: T... Renam- 
ing this type using a fresh variable at each step would result in a type of the form 
Av ,: T.Av2: T.vy > Avg: T.U > Avg: T... requiring infinitely many variables. 
Similarly, de Bruijn indices [27] yield a type of the form Aràr1 > àr2 > àr3 >... 
that requires an infinite number of natural indices. We thus introduce minimal 
renaming that uses the least amount of variable names as possible (cf. Gauthier 
and Pottier [30]). This ensures that only finitely many terminal symbols are 
necessary, allowing for translating types into simple grammars. 


Type languages live in term languages and we propose a term language to 
consume F+‘ types. Based on Almeida et al. [2], we introduce a message-passing 
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concurrent programming language. Type checking is decidable if type equivalence 
is, and it is, in particular, for Fi’. 
The main contributions of this paper are as follows. 


— The integration of (higher-order) context-free session types into system F#, 
dubbed Fi. 

— A semantic definition of type equivalence via a labelled transition system. 

— The identification of a suitable fragment of System F*# for which type equiv- 
alence is reduced to the bisimilarity of simple grammars. 

— A proof that type equivalence on the full System F% is at least as hard as 
bisimilarity of deterministic pushdown automata. 

— The first internalisation of the Dual type operator in a type language. 

— A term language to consume F'#' types and an accompanying metatheory. 


The type system presented in the paper combines three constructions: se- 
quential composition of session types, higher-order kinds via type-level abstrac- 
tion and application, and higher-order recursion. Prior to our work there is the 
system by Almeida et al. [4] which incorporates sequential composition and (first- 
order) recursion, but no higher-order kinds. There is also the system by Cai et 
al. [14] which incorporates higher-order kinds and higher-order recursion, but no 
sequential composition. Our system is the first to incorporate all three construc- 
tions. Although some of the results are incremental and generalize results from 
the literature, the main technical challenge is understanding the border past 
which they do not hold anymore. For example, “just” including higher-order 
kinds into the system by Almeida et al. does not work, since we need to pay 
close attention to variable names, making sure that type equivalence is invari- 
ant with respect to alpha-conversion (renaming of bound variables). This called 
for a novel notion of renaming, inspired by Gauthier and Pottier [30]. Similarly, 
“just” including sequential composition into the system of Cai et al. does not 
work, since finite-state automata (or regular trees) are not enough to capture 
the expressive power of the new type system, even when restricted to first-order 
recursion. This required us to look at the more expressive framework of simple 
grammars, and introduce a translation from types to words of a simple grammar. 

The rest of the paper is organised as follows. The next section motivates 
the type language and introduces the term language with an example. Section 3 
introduces System Fi}, Section 4 discusses type equivalence and Section 5 shows 
that type equivalence is decidable for a fragment of the type language. Section 6 
presents the term language and its metatheory. Section 7 discusses related work 
and Section 8 concludes the paper with pointers for future work. Proofs for the 
main results can be found in a technical report on arXiv [20]. 


2 Motivation 


Our goal is to study type systems that combine equirecursion, higher-order poly- 
morphism, and higher-order context-free session types, while incorporating these 
in programming languages. 
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Fig. 1: Six F-systems. 


Extensions of System F. Figure 1 motivates the construction by proposing six 
different type languages, culminating with F*#. The initial system, F”, includes 
well-known basic type operators [57]: funero T + U, records {l;: T;} and 
variants (/;: 7;). Type Unit is short for {}, the empty record; we can imagine 
that Unit stands in place of an arbitrary scalar type such as Int and Bool. We also 
include variable names a, type quantification Va: «.T and recursion pa: K.T. 
To control type formation, all variable bindings must be kinded with some kind 
K, even if for the initial system, F”, we only use the functional kind T. 

We then build on F” by considering (regular, tail recursive) session types; 
we represent the resulting system by F’. For example ?Int.!Bool.End is a type 
for a channel endpoint that receives an integer, sends a boolean, and terminates. 
At this point we introduce a kind s of session types to restrict the ways in 
which we can combine session and functional types together. For example, a 
well-formed type ?T.U is of kind s and requires U to be also of kind s (whereas 
T can be of kind x, that is s or T). An example of an infinite session type is 
ua: S.!Int.a that endlessly outputs integer values. For a more elaborate example 
consider the type IntStream = wa: S. &{Done: End, More: ?Int.a} that specifies 
a channel endpoint for receiving a (finite or infinite) stream of integer values. 
Communication ends after choice Done is selected. 

The next step of our construction takes us to context-free session types; the 
resulting system is denoted by F'#*. We introduce a new construct for sequen- 
tial composition T;U, and a new type Skip, acting as the neutral element of 
sequential composition [68]. The message constructors are now unary (?T and 
IT) rather than binary. In System F' we distinguish between the traditional 
End type and the Skip type. These types have different behaviours: End termi- 
nates a channel, while Skip allows for further communication. Type equality is 
more subtle for context-free session types, because of the monoidal semantics of 
sequential composition. It is derivable from the following axioms: 


Skip; T ~ T Neutral element 
End; T ~ End Absorbing element 
(URV ~T: (GeV) Associativity 
Ofl: Ti} U ~ O{li: T;;U} Distributivity 
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Fig. 2: Relation between the main classes of types in this paper (arrows denote 
strict inclusions). 


Although the syntax of F™ is not formally included in the syntax of F", 
we can embed recursive session types into context-free session types by mapping 
{T.U into #7; U. It is well-known that context-free session types allow for higher 
computational expressivity: while F” and F* can be represented via finite-state 
automata, F”: can only be represented with simple grammars [4,33]. 

To finalise our construction, we include type abstraction àa: «.T and type 
application T U. Again, type abstraction binds a variable which must be kinded. 
Kinds can now be of higher-order x = x’. For each of the three systems F”, F, 
Fs we arrive at a higher-order version, respectively F#, FM, FM (all of which 
we represent as FY ). In System FH, for example, we can specify channels for 
receiving (finite or infinite) sequences of values of arbitrary (but fixed) types, 


Stream = Aa: T.(u b: S. &{Done: End, More: ?a.8}) 


where a can be instantiated with the desired type; in particular, Stream Int would 
be equivalent to the aforementioned |ntStream. 

It turns out that the expressive power of general higher-order systems FM 
is too large for practical purposes. Even the simplest case F# is at least as ex- 
pressive as deterministic pushdown automata (or equivalently, first-order gram- 
mars), for which known equivalence algorithms are notoriously impractical. By 
impractical we mean that, although there exists a proof of decidability (due to 
Sénizergues [61], later improved by Stirling and Jancar [46,65]), the underlying 
algorithm is rather complex. To the best of our knowledge, there is no practical 
implementation of an algorithm to decide the equivalence of deterministic push- 
down automata. This is essentially due to polymorphic recursion, which can be 
encoded by a higher-order u-operator (we provide an example at the end of Sec- 
tion 5). Therefore, it makes sense to restrict the kind « of the recursion operator 
la: K.T. We use the notation u, to mean the subclass of types written using 
only *-kinded recursion, i.e., “a: T.T or pa: S.T. 

Figure 2 summarizes the main relations between the classes of types in our 
paper. Firstly, we obtain a lattice where the expressive power increases as we 
travel down (from functional to session to context-free session types) and right 
(from simple polymorphism to higher-order polymorphism with monomorphic 
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recursion to arbitrary recursion). Four of the classes can be represented using 
finite-state automata (up to F#*'). By including sequential composition (F* 
and F#*') we are still able to represent types using simple grammars. Once 
we allow for arbitrary recursion, the expressiveness of our model requires the 
computational power of deterministic pushdown automata. 


Programming with Fs. We now turn our attention to the term language, a mes- 
sage passing, concurrent functional language, equipped with context-free session 
types. Start with a stream of values of type a. Such a stream, when seen from 
the side of the reader, offers two choices: Done and More. In the former case the 
interaction is over; in the latter the reader reads a value of type a, as in ?a, and 
recurses. This is the stream type we have seen before only that, rather than clos- 
ing the channel endpoint (with type End), it terminates with type Skip, so that 
it may be sequentially composed with other types. In this informal introduction 
to the term language we omit the kinds of type variables. 


type Stream a = &{Done: Skip, More: ?a ; Stream a} 


A fold channel, as seen from the side of the folder, is a type of the following 
form. We assume that application binds tighter than semicolon, that is, type 
Stream a ; !b ; End is interpreted as (Stream a) ; !b ; End. 


type Fold a b = ?(b > a +> b) ; ?b ; Stream a ; !b ; End 


Consumers of this type first receive the folding function, then the starting ele- 
ment, then the elements to fold in the form of a stream, and finally output the 
result of the fold. The type terminates with End for we do not expect type Fold to 
be further composed. Compare Fold with the type for a conventional functional 
left fold: (b — a 4b) > b > List a > b. 

We now develop a function that consumes a Fold channel. Syntax x > f is for 
the inverse function application with low priority, that is x > f > g = g (f x). 
Recall that Unit is an alternative notation for the empty record type, {}. 


foldServer : Va.Vb. Fold a b —> Unit 
foldServer c = let (f, c) = receive c in 
let (e, c) = receive c in foldS fec 


foldS : Va.Vb. (b > a —> b) > b — Stream a;!b;End ~ Unit 
foldS f e c = match c with 

{ Done c + c P> send e Pp close 

, More c — let (x, c) = receive c in foldS f (f e x) c 


} 


Function foldServer consumes the initial part of the channel and passes the rest 
of the channel to the recursive function folds that consumes the whole stream 
while accumulating the fold value. In the end, when branch Done is selected, 
the fold value is written on the channel and the channel closed. In general, the 
channel operators—receive, send, select—return the same channel in the form 
of a new identifier. It is customary to reuse the identifier name—c in the example, 
as in let (f, c)= receive c—since it denotes the same channel. Syntax cp... 
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hides the continuation channel. The case for the external choice—match—also 
returns the continuation (in each branch) so that interaction on the channel 
endpoint may proceed. 

We may now write different clients for the foldServer. Examples include a 
client that generates a stream from a pair of integer values (denoting an inter- 
val); another that generates the stream from a list of values; and yet another 
that generates the stream from a binary tree. We propose a further client. Con- 
sider the type of a channel that exchanges trees in a serialized format [68]. Its 
polymorphic version, as seen from the point of view of the reader, is as follows: 


type TreeChannel a = TreeC a ; End 
type TreeC a = &{Leaf: Skip, Node: TreeC a;?a;TreeC a} 


We transform trees as we read from tree channels into streams. Function 
flatten receives a tree channel and a stream channel (as seen from the point of 
view of the writer, hence the Dual) and returns the unused part of the latter. 


flatten : Va.Vc. TreeChannel a — (Dual Stream a);c > c 


We are now in a position to write a client that checks whether all values in 
a tree channel are positive. 


allPositive : TreeChannel Int — Dual (Fold Int Bool) — Bool 
allPositive t c = 
let c = send (Ax:Bool.Ay:Int. x && y > 0) c in 


let c = send True c in 
let c = flatten [Int] [?Bool;End] t c in 
let (x, c) = receive c in 


close c; x 


The client sends a function and the starting value on the fold channel. Then, 
it flattens the given tree t, receives the folded value and closes the channel. 
Syntax flatten [Int] [?Bool;End] is for term-level type application. We mean 
to flatten a tree of Int values on a stream channel whose continuation is of type 
?Bool;End. The continuation channel is bound to c so that we may further receive 
the fold value and thereupon close the channel. Syntax e1;e2 is for sequential 
composition and abbreviates let {} = e1 in e2 given that {}, the Unit value, is 
linear and hence must be consumed. 

Finally, a simple application creates a new TreeC channel, passing one end 
to a thread that produces a tree channel. Function new creates a channel and 
returns its two ends. It then creates a Fold channel, distributes one end to a 
thread foldServer and the other to function allPositive. The fork primitive 
receives a suspended computation (a thunk, of the form \x:Unit.e) and creates 
a new thread that runs in parallel with that from where the fork was issued. 


system : Bool 

system = let (tr, tw) = new [TreeC Int] () in 
fork (A_:Unit. produce tw); 
let (fr, fw) = new [Fold Int Bool] () in 
fork (A_:Unit. foldServer fr); 
allPositive tr fw 
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2S Kind of proper types 4 == Type constant 
2 session =} x= *>T arrow 
i functional (li) x=>T record, variant 
r= Kind lk (k= k)=>k recursive type 
* kind of proper types Va (k= *) => T universal type 
RER kind of type operators Skip S skip 
E = Type End S end 
L type constant tf *= sS input, output 
a type variable ; [= s="s seq. composition 
Aa: K.T type-level abstraction o{li} T3% choice operators 
TI type-level application Dual ă s>s dual operator 
Fig. 3: The syntax of types. Fig. 4: Type constants and kinds. 
Type renaming renames (T) 


renames(:) =: renames(a)=a_ renames(T U) = renamegusy(v) (7) renames (U) 
renames (àa: K.T) = Av: K.renames(T|v/a]) where v = firsts (àa: K.T) 


Fig. 5: Type renaming. 


3 Kinds and Types 


This section introduces in detail System F’#', an extension of System F# incorpo- 
rating higher-order context-free session types. The syntax of types is presented 
in Fig. 3. A type is either a constant ų (as in Fig. 4), a type variable a, an 
abstraction àa: «K.T or an application T U. Besides incorporating the standard 
session type constructors as constants, system F#* also includes Dual as a con- 
stant for a type operator mapping a session type to its dual. Note also that 
Va: «.T is syntactic sugar for V,,(Aa: «.T). Analogously, a: K.T abbreviates 
fu.;(Aa: K.T). This simplifies our analysis as lambda abstraction becomes the 
only binding operator. 

A distinction between session and functional types is made resorting to kinds 
s and T, respectively. These are the kinds of proper types, *; we use the symbol 
kK to represent either the kind of a proper type or that of a type operator, of the 
form « > «’. A kinding context A stores kinds for type variables using bindings 
of the form a: «K. Notation A+ a: « denotes the update of kinding context A, 
defined as (A,a: k)+ a: K =A,a:«’ and A+a:%=A,a:« when a ¢ A. 

To define type formation, we require a few notions. Firstly comes the notion 
of renaming, adapted from Gauthier and Pottier [30] and presented in Fig. 5. 
Renaming essentially replaces a type T by a minimal alpha-conversion of T. By 
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alpha-conversion we mean that renames(T) renames bound variables in T. By 
“minimal” we mean that each bound variable is renamed to its lowest possible 
value. We assume at our disposal a countable well-ordered set of type variables 
{v1,...,Un,---}. In renames(T), parameter S is a set containing type variables 
unavailable for renaming; in the outset of the renaming process S is the empty 
set, since all variables are available. In that case the subscript S$ is often omitted. 
The case for lambda abstraction renames the bound variable by the smallest 
variable not in the set SU fv(Aa: K.T), which we denote by firsts(Aa: K.T). 

Renaming is what allows us to check whether type abstractions Aa: K.T, 
AB: K.U are equivalent. For the types to be equivalent, both bound variables a 
and 3 ought to be renamed to the same variable vj. In summary, renaming pro- 
vides a syntax-guided approach to the equivalence of lambda-abstractions, where 
the names of bound variables should not matter. Our notion of type equivalence 
preserves alpha-conversions up to renaming: if T and U only differ on bound vari- 
ables, then rename(7’) = rename(U) and in particular rename(T) ~ rename(U). 
We will come back to this point after we define type equivalence in Section 4. 

We can easily see that renaming uses the minimum amount of variable names 
possible; for example, rename(Aq: T.A3: 8.3) = Avi: T.v: S.vi. Notice how 
both bound variables a and (6 are renamed to vı, the first variable available 
for replacement. Also, renaming blatantly violates the Barendregt’s variable 
convention [9] used in so many works; for example rename(v, (Aa: T.a)) = 
vy (Avı: T.v1), where variable v, is both free and bound in the resulting type. 
Even if renaming violates the variable convention, substitution can still be per- 
formed without resorting to the “on-the-fly” renaming of Curry and Feys [21,40]. 
When vı Æ v2, we have that 


(Avi: K.Avg: K'.U)T reduces to rename((Av2: «’.U)[T'/v1]). 


Then, we have (Ava: K’.U)[T/v1] = Ave: K’.(U[T/vı]) since the renaming rule 
for application guarantees that v2 ¢ fv(T). Otherwise if vı = v2, we have 
(Avi: «.U)[T/vi] = Avi: K.U. This justifies the inclusion of set S in the re- 
naming process. From now on, we assume that all types have gone through the 
renaming process. 

Next comes the notion of type reduction (Fig. 6). Apart from beta reduc- 
tion (rule R-8), the definition provides for sequential composition, for unfolding 
recursive types and for reducing Dual T types. Note that renaming is further 
invoked in rule R- for beta reduction does not preserve renaming: consider the 
renamed type (Avı: T.Av2: T.v, > v2) Unit. The type resulting from the sub- 
stitution (Av2: T.v > v2)[Unit/vi] is Avg: T.Unit + v2 which is not renamed 
and, therefore, not equivalent to Avı : T.Unit —> vı according to our rules in Sec- 
tion 4. Thanks to our modified rule R-8, we preserve renaming under reductions: 
if T = rename(T) and T — U then U = rename(U). 

We also need the notion of weak head normal form borrowed from the lambda 
calculus [9,10]. We say that a type T is in weak head normal form, T whnof, if it 
is irreducible, i.e., T 4. Although this is a negative definition, in the technical 
report we provide an equivalent, rule-based characterisation of weak head normal 
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Type reduction TT 
R-SEQ2 
R-SEQ1 T = V R-Assoc R-p 
Skip; T T — T;U); V — T; (U; V T — T (kT 
pToOT zy vy TU) UV) m (uT) 
R-TAPPL 
Rp T— U R-D; 
(Aa: k.T)U — rename(T[U /a]) TV > UV Dual (T; U) — Dual T; Dual U 
R-DSKIP R-DEND R-D? R-D! 
Dual Skip — Skip Dual End — End Dual (? T) — !T Dual (!T) — ? T 
R-D& R-D® 
R-DCTx 
+ OF R-DDVaAR 


Dual (Dual (a T1 ...Tm)) aT1...Tm 
Dual T —> Dual U 


Fig. 6: Type reduction. 


Type formation AFT :k 

K-CONS K-VAR K-TABS K-TAPP 

a a:KEA Ata: KET: K AKT: 6>6 AFU:«% TU nom 
is Aka:k AF`ak Tik >K AFTU:K' 


Fig. 7: Type formation. 


form types, which can be used in a compiler as well as in our proofs. We say that 
type T normalises to type U, written T 4} U, if U whnf and U is reached from T 
in a finite number of reduction steps (note that any term which is already whnf 
normalises to itself). We write T norm to denote that T |) U for some U. 

For example, suppose we want to normalise the type us T, where T is the type 
Av ,: s.p{Done: End, More: !a}; Dual v1. By computing all reductions from ps7, 
we obtain usT — T (usT) — @{Done: End, More: !a}; Dual (usT) —4 for 
which we conclude that usT 4 @{Done: End, More: !a}; Dual (41,7). Similarly, 
we can reason that pr (Avi: T.v1), Hs (Avi: S.Skip; v1) and ps (Avi: S.Dual v1) 
are all examples of non-normalising expressions. 

Equipped with normalisation, we can introduce type formation, which we 
do via the rules in Fig. 7. Rule K-Const introduces constants as types whose 
kinds match those of Fig. 4. Rule K-Var reads the kind of a type variable from 
context A. An abstraction Aw: «K.T is a well-formed type with kind « > «’ if T 
is well formed in context A updated with entry a: « (rule K-TABs). The update 
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is necessary since we are dealing with renamed types and the same type variable 
may appear with different kinds in nested abstractions. 

It is not until we reach rule K-TApp that we find a proviso about the normal- 
isation of a type. This is standard and analogous to a condition on contractivity. 
The goal is to eliminate types that reduce indefinitely without reaching a whnf. 


Theorem 1. Let AFT: k. 


Preservation. If T — U, then AFU:k. 
Confluence. [fT — U and T — V, then U —* W and V —* W. 
Weak normalisation. T |! U for some U. Furthermore, if T |) V, then U = V. 


We finally arrive at the main decidability result in this section. In its proof, 
we make use of the fact that recursion is restricted to kind « to limit the possible 
subexpressions of the form yu. U that might appear in the normalisation of T. 


Theorem 2 (Decidability of type formation). At T': « is decidable for 
types in Fibs. 


4 Type equivalence 


This section introduces type bisimulation as our notion of type equivalence. We 
define a labelled transition system (LTS) on the space of all types and write 
T —+ U to denote that T has a transition by label a to U. The grammar for 
labels and the LTS rules are in Fig. 8. 

If T is not in weak head normal form, then we must normalise it to some 
type U, so that T has the same transitions as U (rule L-Rep). Otherwise if 
T whnf, then the transitions of T can be immediately derived by looking at the 
corresponding rule for T as follows. If T is a variable, use rule L-VAR1 (with 
m = 0). If T is a constant (other than Skip), use rule L-Const. Note that if T is 
a lone Skip, then it has no transitions. If T is an abstraction, use rule L-Ass. 

If T is an application, then we need to look inside the head. We write T as 
To Ti ...Tm with m > 1 where Tọ is not an application, and look at To. If To 
is a variable, use rules L-Var1 and L-VAR2. If Tọ is one of the constants >, V,., 
@{l;} or (i), use rule L-ConstApp. Note that To is neither an abstraction nor 
Hr, Since T is in weak head normal form. If Tọ is ft, we use rules L-MsG1 and 
L-Msa2. If Tọ is Dual, then the only way for T to be well-formed and in weak 
head normal form is if m = 1 and T; is a or a U1 ...Um, in which case we use 
rules L-DUALVARI1 and L-DUALVAR?2. 

If To is ; , we require an additional case analysis on 7). If m = 1, use rule 
L-SEQ1. Otherwise m = 2 due to kinding. If T} is a variable, use rule L-VARSEQ1 
(with m = 0). If T; is a constant, then it must be of kind s. Tı cannot be Skip, 
because T is in weak normal form, so it must be End, in which case we use rule 
L-EnpSkq (End is an absorbing element, so End; U simply makes a transition to 
Skip without executing U). If Tı is End. Note that Tı cannot be an abstraction 
due to kinding. 
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a i= ai |u| Aai (i > 0,: Æ Skip) Transition labels 
Labelled transition system PSU 
L-RED L-VAR1 L-VAR2 L-CONST 
T—=U USV m>0 1<j<m t Æ Skip 

Ty @ Ti ses Tma —> Skip aTi eie T ı => Skip 


L-CONSTAPP 
R Tepe OE O IN 
Aa: «.T T {tT —> T {tT —> Skip 


iT. ee 


L-VARSEQ1 L-VARSEQ2 
et m0 ere ni pe 
py = aj tT;U >T 
(a Ti ... Tm); U = U (a Ti... Tm); U —> T; 
L-MSGSEQ2 TAC OI BREG L-ENDSEQ 
fo l<jgm End eis 
iT; U = U ot} End; U —> Skip 
L-DUALVARIL L-DUALVAR2 
Duala %...%) “Sw ht Dual (a Tı ... Tm) E} Skip 
L-DUALSEQI1 L-DUALSEQ2 
(Dual (a T,...Tm));U “4 a Ty... Tin (Dual (a T ...Tm));U “9 U 


Fig. 8: Labelled transition system for types. 


If T; is an application, then again we write T} as Up U,...U, with n > 1 
where the head Up is not an application, and look at Uo. If Uo is a variable, use 
rules L-VARSEQ1 and L-VARSEQ?2. If Up is a constant, it must be one of ; , Lux, f, 
©{1;} or Dual due to kinding. If Uo is f, use rules L-MscSmql and L-MSGSEQ2. 
If Up is ©{1;}, use rule L-CnorceSka. If Up is Dual, the only way for T to be 
well-formed and in weak head normal form is ifn = 1 and U; isa ora Vi... Ve, 
in which case we use rules L-DUALSEQ1 and L-DUALSEQ2. Note that Up cannot 
be ; , 4 or an abstraction, since T is in weak normal form. 


Let us clarify our LTS rules with an example. Consider the following type 
Av 1: T.y v2: S.@{Done: End, More: !vı }; Dualv2 and call it T. T is a type ab- 
straction (on type variable vı), of kind T > s. It specifies a channel alternating 
between: offer a choice and output a value of type vı; or select a choice and 
input a value of type vı. The polarity is swapped thanks to the application of 
constant Dual to the recursion variable v2. To construct the (fragment of the) 
LTS generated by this type, let us first desugar T into Avı: T.U where U is the 
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P Avi: T End i 
Skip Avi: T.U — U End; Dual U ——> Skip 


T | S /2tDone, More}1 


v1 Dual (Dual U) ==» @{Done: End, More: !v }; Dual U 


N f N Done, More}2 


?vı; Dual (Dual U) 'u1; Dual U 
kf Done; More}2\ VA ~e 
&{Done: End, More: ?vı }; Dual (Dual U) = Dual U U1 
&{Done, More} ff |» 
Skip += End; Dual (Dual U) Skip 


Fig. 9: The LTS for type Av; : T.U. Normalisation T; |) Tə is represented as T} = 
T> and U is a shorthand for type us (Av2: S.@{Done: End, More: !vı }; Dual v2). 


type us (Ave: S.6{Done: End, More: !v;}; Dual v2). Notice that U normalises to 
@{Done: End, More: lv: }; Dual U. The LTS for the example is sketched in Fig. 9. 
In this case, only finitely many types appear. However, more elaborate exam- 
ples involving sequential composition or higher-order recursion may lead to an 
infinite graph of transitions. 

Given the LTS rules, we can define, in the standard way, a notion of bisimula- 
tion. A binary relation R on types is called a bistmulation if, for every (T,U) € R 
and every transition label a: 


1. if T = 7’, then there exists U’ s.t. U = U’ and (T’,U’) € R; 
2. if U ++ U’, then there exists T’ s.t. T 4 T’ and (T’,U’) € R. 


We say that types T and U are bisimilar, written T ~ U, if there exists a 
bisimulation R such that (T,U) € R. 

Intuitively, a notion of type equivalence must preserve and reflect the syn- 
tax of type constructors: for example, a type T — U is equivalent to a type 
T’ + U' iff T, T’ are equivalent and U, U’ are equivalent. Using the bisimula- 
tion technique, we achieve this by considering a labelled transition system on 
types: T — U has a transition labelled —>; to T and a transition labelled — 2 to 
U. In this way, T — U can only be equivalent to another type which has two 
transitions with those same labels. For each of the type constructors (>, Yx, !, ?, 
©{0;}, and so on) we have suitable transition rules. Moreover, a type sometimes 
needs to be reduced before a type constructor is found at the root of the syn- 
tax tree. If T normalizes to U, then we expect T and U to be bisimilar, which 
is achieved thanks to rule L-Rep. This handles the various reductions: beta- 
reductions arising from lambda-abstraction and applications (e.g., (Aa: 4.7) U 
reduces to rename(7'[U/a])), reductions arising from the monoidal structure of 
sequential composition (e.g., Skip; T reduces to T), reductions arising from the 
internalisation of duality as a type constructor (e.g., Dual (!T) reduces to ?T) 
and reductions arising from the recursion (e.g., px T reduces to T (u, T)). 
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Our notion of type equivalence enjoys natural properties and behaves as 
expected with respect to the notions of reduction, normalisation and kinding 
from Section 3. We can derive rules for type equivalence, that could be used 
to define another coinductive notion of equivalence, via effective syntax-directed 
rules. We can show that type equivalence is preserved under renaming, reduction 
and normalisation. We can also show that the axioms for sequential composition 
in the introduction (1) are derivable from our notion of bisimulation. These 
additional results are presented in the technical report [20]. 


5 Decidability of type equivalence 


This section presents results on decidability of type equivalence. Our approach 
consists in translating types to objects in some computational model. We look 
at finite-state automata (for types in F#, F#*, FH, and F#*), simple grammars 
(for types in F% and F#*) and deterministic pushdown automata (for types in 
FH. Fe and Fs), 

We say that a grammar in Greibach normal form is a tuple (T,V,7,R) 
where: 7 is a set of terminal symbols, denoted by a, b, c; N is a set of nonterminal 
symbols, denoted by X,Y, Z; y € N* is the starting word; and R C N xT x N* 
is a set of productions. A grammar is said to be simple if, for every nonterminal 
X and every terminal a, there is at most one production (X,a, ô) € R [51]. 

Greek letters y and 6 denote (possibly empty) words of nonterminal symbols. 
Productions are written as X —“> 5. We define a notion of bisimulation for 
grammars via a labelled transition system. The system comprises a set of states 
N* corresponding to words of nonterminal symbols. For each production X —*> 
y and each word of nonterminal symbols ô, we have a labelled transition Xô —> 
70. We let ~ denote the bisimulation relation for grammars (the definition is 
similar to that in Section 4). 

For the moment we focus on the class Fi’*} and we explain how to con- 
vert a type T into a simple grammar (Tr, Nr, word(T), Rr). The conversion is 
based on a function word(T) that maps each type T into a word of nonterminal 
symbols, while introducing fresh nonterminals and productions. In our construc- 
tion, following the approach by Costa et al. [19], we use a nonterminal symbol 
with no productions, denoted by L, in order to separate the two descendants 
of a send/receive operation such as !T; U. The sequence of nonterminal symbols 
word(T’) is defined as follows. First consider the cases in which T whnf. 


— For any m > 0: word(a Ti...Tm) = Y for Y a fresh nonterminal symbol 
with a production Y <% e as well as Y —+ word(T7;)1 for each 1 < j < m. 

— word(Skip) = €. 

— word(End) = Y for Y a fresh symbol with a single production Y = iii 

— for any . Æ Skip, End: word(¿) = Y for Y a fresh nonterminal symbol with a 
single production Y —> e. 


AREN 


— word(àa: «.T) = Y for Y a fresh symbol with a production Y —+' word(T). 
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— for any m > 1 and for ų one of >, Vs, O{1;}, (li): word(c Ti ---Tm) = Y fora 
fresh nonterminal Y with a production Y + ee ;) for each 1 < i <m. 


— word(įT) = Y for Y fresh with productions Y Eun word(T Ja and Y $3 e. 
— word(; T) = Y for Y a fresh symbol with a production Y —> word(T). 
— word(T; U) = word (T) word(U). 
Dualı 
( 
(a 


— word(Dua Ag" Tı...Tm)) = Y for Y a fresh symbol with productions Y —> 


Duals 


word Tm) and Y —> 


Finally, let us handle the cases where T is not in weak head normal form. 


— If T }} Skip, then word(T) = € 

— Otherwise if T 4 U 4 Skip, then word(T) = Y for Y a fresh nonterminal 
symbol. Let Zô = word(U). Then Y has a production Y — yô for each 
production Z —> y. 


In the above construction, we create fresh symbols each time we encounter a 
weak head normal form other than Skip. In other words, Myr is the set contain- 
ing L and all nonterminals Y created during the computation of word(T). An- 
other key insight is that the sequential composition of types is translated into a 
concatenation of words: word(7;7>;...;T;,) = word(T) word(7>)...word(T,,). 
This allows our construction to terminate: even if the transitions lead to in- 
finitely many types, they are split on the sequential composition operator, and 
so we only need to consider finitely many subexpressions. 

For the last case in our construction to be well-defined, i.e., when T |) U 4 
Skip, we require word(U) to be non-empty. Indeed, if Uwhnf, then we can observe 
(by inspecting all cases) that word(U) = £ iff U = Skip. We also need to argue 
that the construction of word(T) eventually terminates. For this, we keep track of 
all types visited during the construction, and we only add a fresh nonterminal Y 
to our grammar if the type visited is syntactically different from all types visited 
so far. Therefore, we reuse the same symbol Y with the same productions each 
time we revisit a type. With all these observations, we get the following result. 


Lemma 1. Suppose that T € Fi*i. Then the construction of word(T) termi- 
nates producing a simple grammar. 


We illustrate the above construction with the polymorphic tree exchanging 
example from Section 2, 


type TreeC a = &{Leaf: Skip, Node: TreeC a; ?a ; TreeC a} 


that is written in F#* as To = Avi: T.ve: S. &{Leaf: Skip, Node: v2; ?v1; v2}. 
For ease of notation, in this example we write &; as shorthand for & {Leaf, Node};. 


Since To is in weak head normal form, word(Tọ) returns a fresh symbol, which 


we call Xo. We also have a production Xo pus word(T|), where Tı is the 


type vz: S. &{Leaf: Skip, Node: v2; ?v1; v2}. Since Tı is not in whnf, we must 
normalise it, to get To = &{Leaf: Skip, Node: Tı; ?v1; Tı}. Therefore word(T,) 
returns a fresh symbol, which we call X,. To obtain the transitions of X,, we 
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must first compute woes) which is a fresh symbol Xə with transitions X2 2, 
eaa and X2 = word(T7}; ?v1; Tı). Thus we also get X1 A, word(Skip) 


and Xı BLS word(T\; 2u1; Tı). 

We have word(Skip) = £, but we still need to compute word(T;;?v1; T1). 
This type normalises to T3 = Tz; ?v1; Tı since T; 4 T2. Thus word(7}; ?v1; 71) 
is a fresh symbol X3. To obtain the productions of X3 we must compute 
word (T>; ?v1; T1) = word(T>) word(?v1) word (T1). At this point we already have 
word(T1) = X; and word(7>) = X2. We still need to compute word(?v1), which 
is a fresh symbol X4 with productions X4 REN word(vı)L and X4 22 e. In 
turn, word(v) is a fresh symbol X; with a production X; => e. Finally, we get 
wore ?u1;T,) = XaXaXr, which means we can write the productions for X3: 
X3 Ean X4Xı and X3 8 X3X4X}. 

Putting all this together, we can finally obtain the simple grammar: 


&2 


Avit & &2 & 
= K Xi e Xı — X3 Xe Xə —> X3 


Xo 
BSu GAG URL Ur Xs he 


Next, we argue that type equivalence (i.e., bisimilarity on types) corresponds 
to bisimilarity on the corresponding grammars. This is achieved by the following 
lemma, that asserts that the LTS of a type and the LTS of the corresponding 
word of nonterminals have exactly the same transitions. 


Lemma 2 (Full abstraction). Let T € F#® and (Tr,Nr,word(T), Rr) the 
corresponding simple grammar. Suppose also that word(T) % y. 


1 


1. If T “+ U then there exists y' such that y = y' and word(U) = 7’. 
2. Ify — y then there exists U such that T — U and word(U) ~ 7’. 


As a consequence of the above result, we get soundness and completeness 
of the bisimilarity word(T) ~ word(U) with respect to the bisimilarity T ~ U. 
Indeed by Lemma 2, any sequence of transitions starting from T can be matched 
by a sequence of transitions starting from word(T); and similarly for U. Thus 
T ~ U iff word(T) ~ word(U). 


Theorem 3. The type equivalence problem is decidable for types in Fu’. 


For the remainder of this section, we look at the other classes of types in 
Fig. 2 and examine the computation models they correspond to. Since class F” 
is contained in F#*:, we can express types without A-abstractions with simple 
grammars as well. In this way we recover previous results in the literature [4,19]. 

Let us now look at the class F#*. In this class we do not have Skip nor 
sequential composition and message operators are binary (#T.U) rather than 
unary. Since we do not have sequential composition, there is no need to consider 
words of nonterminals, and instead it suffices to translate types into single sym- 
bols, i.e., states in an automaton. Moreover, since there is no recursion beyond 
Hk, only finitely many types can be reached from a given T. We can thus adapt 
our construction as follows for F#+'. In the definition of the LTS (Fig. 8): 
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— discard all rules involving sequential composition; 

— discard rules L-VAR1 for m > 0 and L-DUALVAR2 (they were only needed to 
distinguish types in sequential composition); 

— discard case . = End in rule L-Consrt (so that End no longer has transitions); 

— replace Skip with End on the right-hand side of rules L-VARr1 with m = 0 
and L-Const; 

— discard rules L-Mse1 and L-Msq@2 and treat . = į like the other constants in 
rule L-CONSTAPP. 


Also replace the construction of word(7) into a construction of state(T), as- 
sociating to each type T a state in a finite-state automata. For each transition 
T —+ U we have the corresponding transition state(T) > state(U). Notice 
that the resulting automata is deterministic since the original LTS is also deter- 
ministic (for each type T and label a, there is at most one transition T > U). 
Since bisimilarity of deterministic finite-state automata can be decided in poly- 
nomial time [44], we get the following results. 


Theorem 4. 


1. To each type T in F#* we can associate a finite-state automata correspond- 
ing to the (fragment of the) LTS generated by T. 
2. The type equivalence problem is polynomial-time decidable for types in FH*. 


Clearly, Theorem 4 applies to the subclasses of Fi": FY, FY and F#*. In 
this way we recover previous results in the literature [14,19,33]. 

Finally, we consider the classes Fi}, Fi’ and Fi} involving arbitrarily-kinded 
recursion. We shall show that these classes are already powerful enough to simu- 
late deterministic pushdown automata; hence, the type equivalence problem be- 
comes impractical (i.e., no practical implementation of an algorithm is known). 
We only focus on the simplest case F#, as the others two classes are even more 
expressive. Instead of looking at deterministic pushdown automata, we look at 
deterministic first-order grammars, which constitute an equivalent model of com- 
putation [46]. This choice simplifies our construction. We say that a first-order 
grammar is a tuple (¥,7,N, E, R) where: 


— X is a set of variables a, 3,...; T is a set of terminal symbols a, ),...; M is 
a set of nonterminal symbols X,Y,... 

— each nonterminal X has an arity m = arity(X) € N. 

— the set E of expressions over X, N is inductively defined by two rules: any 
variable a is an expression; if arity(X) = m and £),..., Em are expressions, 
then so is X =E... Em. Whenever m = 0, X is called a constant. 

— E is an expression over M, called the initial expression. 

— R is a set of productions. Each production is a triple (X,a, Æ), written as 
X a1... —> E, where m = arity(X) and the variables in Æ must be 
taken from Qj,...,Qm.- 


A first-order grammar is deterministic if, for every X and a, there is at most 
one production (X,a, E)E R. 
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Just as a simple grammar defines an LTS over words of nonterminals, a first- 
order grammar defines an LTS over the set € of closed expressions. For each 
production X a,...Q, — E we have the labelled transition X E... Em — 
E|Ey/aa, pua Em / On): 

Let ~ denote bisimilarity over closed expressions according to a first-order 
grammar. We now present a fully abstract (i.e., preserving bisimilarity) trans- 
lation of a deterministic first-order grammar into a type in F#. Each gram- 
mar variable a has a corresponding type variable a (of kind T). An expression 
X E... Em is represented as a type application X E... Em. If X has arity 
m and the productions X a ,...Qm M, E; for a range of j, then we write the 
equation specifying X as a record (since the first-order grammar is determinis- 
tic, all record labels are distinct, and thus the right-hand side on the equation 
specifying X is well-formed). 

X = AG Tess Aam: TAa bras G7 Bea} 


This gives rise to a system of equations {X; = T;}, one for each nonterminal X;, 
where the nonterminals may appear in the right-hand sides 7;. Finally, given 
an initial expression F, it is standard how to convert it into a p-type using the 
system above. 

Using the above translation, we are able to simulate a transition Æ 2 F of 
the first-order grammar as a transition Æ tails F on the corresponding types. 
Therefore, the translation is fully abstract and we get the following result. 


Theorem 5. Let E and F be closed expressions on a first-order grammar and 
E, F the corresponding types. Then E = F iff E ~ F. 


Let us work on an example to better understand the above translation. Con- 
sider the language L3 = {f"ar"a | n > 0} U {£br"b | n > 0} over the alpha- 
bet {a,b,£4,r}. L3 is a typical example of a language that cannot be described 
with a simple grammar, but can be accepted by a deterministic pushdown au- 
tomaton [51]. Consider the first-order grammar with nonterminals X, R, A, B, L, 
initial expression X A B, and productions 


Xa B—+X (Ra) (RB) XaB—+a XaB—>8B 
Raa AL panj 


Note that L is a constant without productions. It is easy to see that the traces 
of this first-order grammar correspond exactly to the words in L3. By following 
the steps in the above translation, we arrive at the system of equations 


X =a: TAP: T {l X(Ra)(RB),a: a,b: B} R=dra: T{r: a} 


Therefore, the initial expression X A B becomes the type 
(ME: TS TST.Aa: TAS: Te: E{r: a}{r: B},a: a,b: B}){a: {}}{b: (Fh, 


whose transitions simulate the transitions of the first-order grammar. 
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c| a | Ae: Tt | rece: Tv | Aa: kv | {L =v} | (l=v)asT 
receive[T'] | receive[T][T] | send[T] | send[Z]v | send[T] v[T] 
tu= o | tt | er) | (h=ti} | let {;=ai} =tint 

(l= t)asT | casetoft | matchtwitht 


Os 


p = (t) | p|p | (vea)p 

i= Term constant 
receive Va: T.VB:S.?a.8 +a® receive on a channel 
send Va: T.a —> V8: s.!la.8 —> 6 send on a channel 
selectl;as@{li: T;}  @{hL: T;} > T; internal choice 
close End —> Unit channel close 
fork (Unit + Unit) > Unit fork a new thread 
new Va: S.a > a ® Duala channel creation 


Fig. 10: Terms and types for term constants. 


6 The term language and its metatheory 


This section briefly introduces a concurrent functional language equipped with 
F}* types, together with its metatheory. The results mostly follow from those in 
the literature, although explicit recursion at the term level and the unrestricted 
bindings in typing contexts are somewhat new in session types. The complete 
set of rules is to be found in the technical report [20]. 

The syntax of terms and processes is defined by the grammar in Fig. 10. The 
same figure introduces types for the constants. The term language is essentially 
the polymorphic lambda calculus with support for session operators, formulated 
as in Almeida et al. and Cai et al. [2,14]. From System F it comprises terms and 
type abstractions, records and variants, including constructors and destructors in 
each case. The support for session operations and concurrency includes channel 
creation (new), the different channel operations (receive, send, match, select and 
close) and thread creation (fork). We program at the term level and use processes 
only for the runtime. Processes include terms as threads, parallel composition 
and channel creation, all inspired in the pi-calculus with double binders [73]. 

Process typing and an excerpt of term typing is in Fig. 11. A judgement of 
the form A | I F t: T records the fact that term t has type T under contexts A 
(recording kinds for type variables) and I (recording types for term variables). 
The judgement for processes, I’ F p, says that p is well-typed under context I’. 
It simplifies that for terms, since processes feature no free type variables and 
are assigned no particular type. Once again, the rules are adapted from the two 
above cited works. The difference to Cai et al. is that we work in a linear setting 
and hence axioms (T-Const and T-Var) work on an empty context, and most 
of the other rules must split the context accordingly. Rule T-TAbs simplifies 
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Term typing A|Trt:T 
T-CONST T-V T-APP 
AE To: * ovat A|Mtt:U03T AlIhete:U 
— Al eer FET 
Al-Fe:T A|, IF tite: T 
T-REC T-TABS 
AFT% A|ļT,xz T >U Fv:T >U A,a:«k|Erv: Ta 
A|frreca:ToUv: Tou A|LF (Aa: k.v): VaT 
T-MATCH T-EQ 
A| Mba: kik T} Alk tfen >T} Al Prt:U AFU:* UnT 
A | Ti, T2 F match tı with te: T ANT PET 
T-DERELICTION T-WEAKENING T-CONTRACTION 
Alre T riU A|Prt:U ATO Ter Lee gE 
ALDE T FEU AJT Trad A| Tæ” Tř tæ/ylle/z]: U 
Process typing rFp 
e| TFt: Unit Di F pi Ia F p2 Tya: Ty: Dual T Hp 
Pr (t) Di, I2 F pi | pe T F (vxy)p 


Fig. 11: Typing (excerpt). 


that of Cai et al.; we can easily show that both rules are interchangeable. We 
support exponentials [37] for recursive functions, so that one may write functions 
that feature more than one recursive call (good for consuming binary trees, for 
example) and branches that do not use the recursive function (for code that is 
supposed to terminate). Towards this end, we add an unrestricted binding x: T 
in term variable contexts, an explicit rule for rec (as opposed to making rec a 
constant as in Cai et al. [14]) and substructural rules for unrestricted bindings 
(T-Dereliction, T-Weakening and T-Contraction). 

Thanks to the power of System F, most of the session and concurrency 
operators are expressed as constants. For example, receive receives a session 
type !a.3 with a, the payload of the message, an arbitrary type and (3, the 
continuation, a session type, and returns a pair of the value received and the 
continuation channel. As usual Va: «x. T abbreviates the type V,, (Aa: K.T). The 
exception is the external choice (T-Match) which can not be captured by a type 
(similarly to T-Case) and hence requires a dedicated typing rule. 

Process reduction is in Fig. 12. Following Milner [55] we factor out processes 
by means of a structural congruence relation that accounts for the associative 
and commutative nature of parallel composition, scope extrusion and exchanging 
the order of channel bindings.We now address the metatheory of our language, 
starting with preservation for both terms and processes. 
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Process reduction pp 
ti > te 
are eG (Elfork v]) => (EL{}]) | (v {}) (E[new[T]]) > (vry)(E[{z, y}]) 
(t1) > (t2) 


(vzy)({E1[receive[T][U] y]) | (Æ2[send[V][W] v2])) > (vzy)({E1Hy, v})) | (£2[x])) 
(væy)((E1ı [match y with {l; = t:}]) | (H2[(select l; as T) x])) > (vey) (E1 [t; yl) | (Ela) 


Pi > p2 


(vzy)({E1[closey]) | (E2[close x])) => (Ea[{}]) | (E210) a ETET 


pi — p2 pi = p2 p2 — P3 p3 = p4 
(vzy)pı > (vxy)pe pı > pa 


Fig. 12: Process reduction. 


Theorem 6 (Preservation). 


1. fA|TEHt: T andt>t, then A| DFU: T. 
2. fT -p andp=p, then rH p. 
3. fT Fp andp—> p, then lH p. 


Progress for the term language is assured only when the typing context con- 
tains channel endpoints only. When A is understood from the context we write 
IT” to mean that I contains only types of kind s, that is A F T : s for all types 
T in I’. Well typed terms are values, or else they may reduce or are ready to 
reduce at the process level. Reduction in the case of session operations—receive, 
send, match, select, close—is pending a matching counterpart. 


Theorem 7 (Progress for the term language). If A| I° | t: T, then t is 
a value, t reduces, or t is stuck in one of the following forms: E|fork v], E|new[T]], 
Elreceive[T][U] v], E[send[U] T[v] z], E[match y with {l; = t;}], E[(select l; as T) £], 
or E|[close z]. 


In order to state our result on the absence of runtime errors we need a few 
notions on the structure of terms and processes; here we follow Almeida et al. [2]. 
The subject of an expression e, denoted by subj(e), is x in the following cases. 


receive[T][U] a send[T]v[U] a match z witht (select 1; as T) x close x 


Two terms e and e2 agree on channel xy, notation agree*’(e;, e2), in the 
following cases (symmetric forms omitted). 


agree” (receive[T] [U] x, send[V] v[W] y) agree” (close x, close y) 
agree™” (match x with {l; = ti piez, (selectl;asT)y) jel 


A closed process is a runtime error if it is structurally congruent to some 
process that contains a subexpression or subprocess of one of the following forms. 
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1. vu where v is not a à or a rec and v Æ receive[T][U], send[T] u, send[T] w[U], 
select 1; as T, close, fork; 


2. v|T] where v in not a A and v Æ receive, receive[U], send, send[U], new; 

3. let {l; = xi} = vint and v is not of the form {l; = u;}; 

4. casevoft and v Æ (l; =u) asT or t Æ {l; = ui}ier with j ¢ I; 

5. receive[T][U] v or send|[T] u[U] v or match v witht or (select las T) v or close v 
and v is not an endpoint 2; 

6. (Eiļe1]) | (E2[e2]) and subj(e1) = subj(e2); 

7. (vy)((Ey[er]) | (E2le2])) and subj(e1) = 2, subj(e2) = y, ~ agree” (e1, e2). 


The four cases are standard to system F with records and variants. The 
support for session types and concurrency in the first two cases (term and type 
application) are derived from the types of values for such operators (Fig. 10). 
Item 5 addresses session operators applied to non endpoints. Item 6 is for two 
concurrent session operators on the same channel end. Finally, Item 7 is for 
mismatches on two session operations on two endpoints for the same channel. 


Theorem 8 (Safety). If T’ + p, then p is not a runtime error. 


An algorithmic typing system can be easily extracted from the declarative 
system for terms in Fig. 11 via a bidirectional type system, formulated along the 
lines of Almeida et al. [2]. 


7 Related Work 


Equirecursion in system F. In first investigations on equirecursive types, the no- 
tion of type equivalence is often formulated in a coinductive fashion [5,11,18,29,38]. 
Two types are equivalent if they unroll into the same infinite tree. Whenever this 
unrolling is the only type-level computation, such trees are regular, enabling ef- 
ficient decision procedures. Some authors have studied equirecursion together 
with other notions of type-level computation. Solomon considers parameterized 
type definitions, which correspond to higher-order kinds [63]. These implicitly 
correspond to A-terms, since reduction occurs as types are allowed to call other 
types. Some authors consider equirecursion in system F, with weaker or stronger 
notions of equality [1,12,14,41]. Regarding equirecursion in system F, the model 
of Cai et al. [14] is the closest to ours, and indeed our results up to F#* can 
be seen as a generalisation of theirs. However, Cai et al. depart from the usual 
setting by allowing non-contractive types (which most authors forbid, including 
this work), requiring a sort of infinitary lambda calculus. Moreover, this work 
further extends additional equivalence properties by including session types with 
their distinctive semantics, such as sequential composition and duality. 


Session type systems. Session types were introduced in the 90s by Honda et 
al. [42,43,67]. Equirecursion was the first approach used to construct infinite 
session types, which often allows type equality to be interpreted according to 
a coinductive notion of bisimulation [52]. In this vein, Keizer et al. [48] utilize 
coalgebras to represent session types. Since the inception of session types, there 
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has been an interest in extending the theory to nonregular protocols [58,59,66]. 
Context-free session types emerged as a natural extension, as it still allowed for 
practical type equality algorithms [3,4,19,28,56,68]. Other approaches that go 
beyond regular session types include nested session types [24] as well as 1-counter, 
pushdown and 2-counter session types [33]. However, the more expressive notions 
are not amenable to practical type equivalence algorithms, just like the higher- 
order types present in our system F#. Polymorphism in session types has also 
been a topic of interest, with or without recursion [15,22,23,31,39]. 


Dual type operator. This work is, to the best of our knowledge, the first that 
internalises duality as a type constructor. Other settings, such as the language 
Alms [72], consider duality for session types as a user-definable, not built in, 
type function. Our Dual is a type operator, not a type function. The difference 
is that a type function involves a type-level computation, which converges to a 
type written without dual. For example, in Alms we would have dual(!Int.End) = 
?Int.End (as a type-level computation), both sides being the same type. In our 
setting, Dual (!Int;End) is a type on its own, which happens to be equivalent 
to ?Int;End. At the same time, our setting allows for types such as Duala, or 
(Duala); T1; T2, which do not reduce. 


Type equivalence algorithms. Algorithms for deciding the equivalence of types 
must inherently be related to the computational power of the corresponding 
type system. This has been used implicitly or explicitly to obtain decidability 
results. As already explained, if equirecursion is the only type-level computation, 
types can be represented as finite-state automata (or equivalently, infinite regular 
trees). Although some exponential time algorithms were first proposed [32], it 
has been established that the problem can be solved in quadratic time [53], 
which is to be expected as it matches the corresponding problem of bisimulation 
of finite-state automata [44]; see also Pierce [57]. 

The next ‘simplest’ model of computation is that of simple grammars, which 
intuitively correspond to deterministic pushdown automata with a single state [33]. 
Almeida et al. [4] provided a practical algorithm for checking the bisimilar- 
ity of simple grammars. By dropping the determinism assumption, we arrive 
at Greibach normal form grammars, which are equivalent to basic process al- 
gebras [6,7]. Bisimilarity algorithms have been studied extensively in this set- 
ting [13,17,47,49]; presently it is known that the complexity of the problem lies 
between EXPTIME and 2-EXPTIME, which does not exclude the possibility of 
a polynomial time algorithm for the simpler model of simple grammars. 

In this paper we present a reduction from first-order grammars to Fi-types, 
showing that the more expressive type systems (F#, presented here and in Cai et 
al. [14], as well as its extensions) are at least as powerful as deterministic push- 
down automata. As far as we know, the closest result to ours is by Solomon [63], 
which shows conversions between a universe of “context-free types” and deter- 
ministic context-free languages. The universe of types studied by Solomon is 
different from F#. With some work we could prove that Solomon’s types can be 
embedded into F#, which would entail our result as a corollary. However, it is 


Gs? 


easier and simpler to prove directly the reduction as we did. 
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The equivalence problem for deterministic pushdown automata was a noto- 
rious open problem for a long time, until Sénizergues showed it to be decid- 
able [61,62]. Since his proof, many authors have tried to refine the result in an 
attempt to arrive at an implementable algorithm [46,64,65]. 


Concurrent term languages. The usefulness of a type system is directly related 
to its capability to be used in a programming language. Type systems such as 
the ones discussed in this work lend themselves quite readily to functional term 
languages [45]. For session types, existing term languages are either inspired 
in the pi calculus [26,73,69] or in the lambda calculus [35,54,70], or even the 
two [71]. The system presented in this paper is linear, meaning that resources 
must be used exactly once [50,74]. Some authors go beyond linearity by consid- 
ering unrestricted type qualifiers [48,73] or manifest sharing [8]. 


8 Conclusion and future work 


This paper introduces an extension of system F which includes equirecursion, 
lambda abstractions, and context-free session types. We present type equivalence 
algorithms, and a term language and its metatheory. Although we have defined a 
rather general system, it turns out that for practical purposes one must restrict 
recursion to H+, that is, to type-level monomorphic recursion. In any case, the 
main system F'#*' is a non-trivial extension of (the contractive fragment of) Fi 
(studied by Cai et al. [14]) as well as F” (studied by Almeida et al. [19]). 

We have only considered polymorphic types of a functional nature: type 
Va: K.T must always be of kind T. It is worth investigating polymorphism 
over session types, as it would allow further additional behaviour. For exam- 
ple, we could be interested in streaming values of heterogeneous nature, as in 
type wa: S. &{Done: Skip, More: V3: T.?6;a}. It is however unclear whether 
this extension would still allow a translation into a simple grammar. 

We proved that the type equivalence problem for systems F#, FH, FH is at 
least as hard as a non-efficiently-decidable problem. We conjecture that these 
systems have the same power as deterministic pushdown automata (and hence, 
admit decidable type equivalence), but we do not have a construction to prove 
this result. In any case, our proof that the type equivalence problem is at least as 
hard as the bisimilarity of deterministic pushdown automata is enough to justify 
focus on the significant fragment with restricted recursion. 

We study either full recursion (for theoretical results) or recursion limited 
to kind « (for algorithmic results). It would be interesting to study in-between 
kinds of recursion; the next natural example is jv... What model of computation 
would we arrive at if we consider types written with this recursion operator? We 
conjecture that types F# and F#', when restricted to recursion of kind * > x, 
would still be expressible as simple grammars, whereas such a restriction in 
the more powerful F# would take us beyond this model, but perhaps without 
reaching the expressivity of deterministic pushdown automata. 
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We introduce CLASS, a session-typed, higher-order, core language that sup- 
ports concurrent computation with shared linear state. We believe that CLASS 
is the first proposal for a foundational language able to flexibly express realistic 
concurrent programming idioms, with a type system ensuring all the following 
three key properties: CLASS programs never misuse or leak stateful resources 
or memory, they never deadlock, and they always terminate. CLASS owes these 
strong properties to a propositions-as-types foundation based on Linear Logic, 
which we conservatively extend with logically motivated constructs for share- 
able affine state. We illustrate CLASS expressiveness with several examples 
involving memory-efficient linked data structures, sharing of resources with 
linear usage protocols, and sophisticated thread synchronisation, which may 
be type-checked with a perhaps surprisingly light type annotation burden. 


1 Introduction 


Stateful programming involving concurrency and shared state plays a prominent 
role in modern software development, but, in practice, getting concurrent code 
right is still quite hard for common developers. Typical sources of “bugs” include 
resource leaks (forgetting to release unused memory or close a socket), violation 
of resource state preconditions (writing to a closed file or sending out-of-order 
messages), races (data invariant breaking, erratic sharing of resources), dead- 
locks (indefinite wait for lock release or incoming messages), livelocks, and even 
general non-termination. Fifty years ago Hoare noted [40]: “Parallel programs 
are particularly prone to time-dependent errors, which either cannot be detected 
by program testing nor by run-time checks. It is therefore very important that 
a high-level language designed for this purpose should provide complete secu- 
rity against time-dependent errors by means of a compile-time check”. It does 
not come as a surprise that finding ways to approximate such certainly very 
ambitious goal is still today the object of exciting intense research. 

In this paper, we approach this challenge by leveraging the propositions- 
as-types (PaT) paradigm towards the realm of concurrency and shared state. 
PaT is known to offer a unifying framework connecting logic, computation, and 
programming languages. Since the seminal work of Curry and Howard [42], it 
is a prolific structuring concept for designing and reasoning about programming 
languages (see [82]). Remarkably, languages derived within PaT intrinsically 
satisfy crucial properties: type preservation (since reduction corresponds to cut- 
reduction), confluence (since computation corresponds to proof simplification), 
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deadlock freedom (as a consequence of cut-elimination) and livelock freedom / 
termination (as a consequence of strong normalisation). 

Although PaT has a traditional focus on functional computation, the emer- 
gence of linear logic has progressively motivated interpretations of stateful/re- 
sourceful computation [78,1,14,2,12], eventually leading to the discovery of tight 
correspondences between session types and linear logic [22,27,81]. These systems 
already capture aspects of state change, namely in the sequential execution of 
session protocols, thus raising the question of whether such approaches could 
be extended to express notions of shared mutable state, subject to interference, 
as found in typical imperative and concurrent programs. Recently, such chal- 
lenge was addressed by several works [9,64,67]. In particular, [67] developed a 
first basic shared state model enjoying all the aforementioned strong properties 
of PaT. However, although [67] supports higher-order shareable store for pure 
values of replicated type, it forbids linear objects, such as stateful processes or 
data structures with update in-place, to be stored and shared as in languages 
like Java, Rust, and in the CLASS core language we introduce herein. 

In this work, we develop a novel, more fundamental approach to shared state 
and PaT, and introduce CLASS, a typed, higher-order, session based core lan- 
guage that supports general concurrent computation with dynamically allocated 
shared linear (more precisely, affine) state. We believe that CLASS is the first 
proposal for a foundational language. able to flexibly express realistic concur- 
rent programming idioms, while ensuring all the following three key properties 
by static typing: CLASS programs never misuse or leak stateful resources or 
memory, they never deadlock, and they always terminate. 

Despite the strength of its type system, CLASS expressiveness and effec- 
tiveness substantially overcomes limitations of related works, as we show with 
compelling program examples that can be algorithmically typed for memory 
safety, dead- and live-lock freedom with a perhaps surprisingly light type anno- 
tation burden. CLASS owes these strong properties to is PaT foundation based 
on Second-Order Linear Logic, already known to capture the polymorphic ses- 
sion calculus and the linear System F [74], but which we conservatively extend 
with novel logically motivated constructs for shareable affine state, also based on 
DiLL co-exponentials [35,67], but to which we give here a different, more general 
and fundamental interpretation. 


1.1 Overview 


A main novelty and source of CLASS’s expressiveness, flexibility and strong 
meta-theoretical properties resides in its mechanism for shared state compo- 
sition. It is interesting to overview such mechanism in the context of the basic 
composition and interaction principles of the fundamental linear logic interpre- 
tations [22,27,81]. Our computational model is structured around processes that 
interact via binary sessions, the basic composition rules being mix and cut. 


P || QF Ay, Ao; I 


PtAy,2:A;P QF 42,2: AT 


Tmi 
| 1] P |z| QF Ay, Ao; T 


[Tcut] 
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The mix rule types the independent composition of processes P and Q, which 
do not share any free names and run side-by-side without interacting. This is 
captured by the implicit disjointness of their linear typing contexts A, and 
A>, declaring the types of their interaction channels. Interactive composition is 
expressed by the cut rule, which connects exactly two processes P and Q through 
a single linear session x with dual typed endpoints (x : A and x : A), following 
Abramsky’s idea of “cut as interactive composition” [1]. 

Intuitively, duality of endpoint (session) types ensures that all interactions 
between P and Q on z always matches: when P sends, Q receives; when Q offers, 
P chooses; and likewise for all types. Notice that sharing a single channel x be- 
tween the threads P and Q is important to ensure acyclicity of proof structures, 
and cut-elimination/deadlock absence. But P,Q may use an arbitrary number 
of linear channels, in 41, A2, to also compose with other processes. 

Shared composition in session types is available for replicated “server” objects 
la(y); P, typed by the linear logic exponential type bang !A. Contraction of the 
dual exponential type why-not ?A allows an unbounded number of usages of 
such replicated server object to be introduced in client processes. In the dyadic 
presentation of linear logic (cf. [5,11]), contraction is expressed by moving ?- 
typed names into the unrestricted context I’, with the [T?] rule. 


QF A; T, z: A iT?) 
eG PHz IAr ?x;QF A,x :?Å; T l RtA,y:A;0,2:A ‘Teall 
la(y);P |z| ?x;Q F ASP call z(y); R- 4; T, x:A 


Names in I’ may be used unrestrictedly; each call (typed by [Tcall]) spawns a 
fresh copy of the server body at type y : A, to be used by the client at type 
y : A, in a linear binary session. By the typing rule for !A (promotion) such copy 
does not depend on linear resources. Thus, interaction with replicated objects 
as captured by the exponentials !A and ?A implements a copy semantics where 
each call obtains a new private stateless copy of the same object. 

In this work, we introduce a third composition mechanism, allowing processes 
to concurrently share mutex memory cells, storing linear state. Mutex memory 
cells and their usages are typed respectively by a pair of dual modalities S, A and 
U.A, whose logical rules are motivated by Differential Linear Logic (DiLL) [35], 
in particular cocontraction, expressed by the type rule [Tsh]. 


PFA: UAT QFĀFA UAT 


[Tsh] 
share x {P || Q} F 4, A’, z: UA; T 


While sharing of replicated objects corresponds to contraction of ?A types, 
shared usage of mutex cells corresponds to cocontraction of Us A types. Apart 
from the explicit use of [Tsh], the type system ensures that memory cells are 
always used linearly. The shared usage x : Us A is free in the conclusion of the 
typing rule, therefore a memory cell may be shared by an arbitrary number of 
processes, by nested iterated use of cocontraction. 
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Moreover, cocontraction also ensures that concurrent processes may share a 
single mutex cell (just like [Tcut] w.r.t. binary sessions). This constraint comes 
from the linear logic discipline, and it is important to ensure deadlock freedom. 
As discussed in Concluding Remarks, this does not hinder CLASS expressiveness 
- e.g., a single mutex cell may act as a gateway to further bundles of shared 
state, organised in resource hierarchies, as our examples illustrate - and even 
suggests convenient concurrent programming structuring techniques. 

To access a mutex memory cell in its (unlocked) full state, typed by Us A, the 
client uses a take operation. Take waits for acquiring the cell lock and reads its 
contents. The cell then transitions to the (locked) empty state, typed by U.A. 
The taking client becomes the sole responsible for filling back the cell contents, 
using a put operation. This will restore the cell to the full state, releasing its 
lock, and making it accessible to other concurrent threads waiting to take it. 
Our mutex memory cell object is thus akin to a behaviourally typed incarnation 
of Concurrent Haskell MVars [45] or Rust std::sync::Mutex objects [46]. 

To ensure safe releasing of a memory cell, its contents are required to be of 
affine type \A. Affine objects are well-behaved disposable values, that when dis- 
carded, safely dispose all resources they hereditarily refer to, this being ensured 
by the linear logic typing. 

We illustrate the introduced concepts with a simple example, where two 
concurrent threads compete to set on an initially off flag, but only one may 
win. The flag iteratively announces its state to the client with either #Off or 
#On. If the state is off, the client must select #turnOn, if the state is on, it will 
remain on. Process flag( f) implements the flag (at name f) in the off state, and 
process on(f) in the on state, defined thus 


flag( f) = #Off f;case f{ | #turnOn: affine f;on(f) } 
on(f) = #On f; affine f;on(f) 


The flag object is typed with the (linear) usage protocol defined by the coinduc- 
tive type Flag below, such that flag(f) + f : Flag and on(f) F f: Flag 


type corec Flag = @{ |#Off : &{ |#turnOn : AFlag}, |#On : AFlag} 


We now consider a scenario where a flag object is shared via a mutex memory 
cell c initially storing a off flag of type AFlag among two concurrent clients. 


client(c, id) + c : U.Flag; id: int = main() + Ø 


client(c, id) = main() = 
take c(f); cut { cell c(f.affine f; flag(f)) 
case f { lc: U, Flag| 
|#0ff : printin id + “: wins.’; share c { 
#turnOn f; client(c, 1) 
put c(f); release c | 
|#0n : printin id + “: loses.’; client(c, 2) 
put c(f); release c } 
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When running main() exactly one of the threads (executing the same code, just 
with a different id) will turn the flag on and win, the other will loose. Notice 
that all threads drop usage of the memory cell c using release, which corresponds 
to DiLL coweakening (([35]). 

When considering a new language, in particular with a static typing disci- 
pline, it is necessary to argue about its expressiveness, and aim for a better per- 
ception of how natural programs get past its typing rules, and of how types help 
in structuring programs. In this paper, we approach these concerns by showcas- 
ing many interesting examples that challenge the expressiveness of the CLASS 
language and type system on realistic concurrent programming scenarios. We 
have developed many more examples, distributed with our implementation [68], 
combining imperative, higher-order functional, and session-based programming 
styles. For all these programs, strong guarantees of memory safety, deadlock- 
freedom, termination, and absence of “dynamic bugs”, even in the presence of 
blocking primitives and higher-order state, are compositionally certified by our 
lightweight type discipline based on Propositions-as-Types and Linear Logic. 


1.2 Outline and Contributions 


We believe that CLASS is the first proposal for a foundational language able to 
flexibly express realistic concurrent programming idioms while ensuring by typ- 
ing three key properties: CLASS programs never misuse or leak stateful resources 
or memory, they never deadlock, and they always terminate. 

In Section 2 we formally present the core language CLASS, its type system and 
operational semantics. Our model builds on the propositions-as-types approach 
to session-based concurrency [22,27,80], extending Second-Order Classical Linear 
Logic with inductive/coinductive types, affine types, and novel primitives for 
shareable first-class mutex reference cells for linear state. 

In Section 3 we state and prove type preservation (Theorem 1), progress 
(Theorem 2) which implies deadlock-freedom, and strong normalisation (Theo- 
rem 3), which also implies livelock absence. Our proof uses a logical relations 
argument, extended with an interesting technique to handle shared state inter- 
ference, which we believe is exploited here for the first time. 

Given the strong properties of its type system, it is of course very important 
to substantiate our claims about CLASS expressiveness. In Section 4 we illustrate 
the expressiveness of CLASS language and type system by going through a series 
of compelling examples. Namely, we discuss a general technique for sharing linear 
protocols, a shareable linked list with update in-place, a shareable buffered chan- 
nel, using a linked list with pointers to tail and head nodes, and executing send 
and receive operations in O(1) time; the dining philosophers, illustrating tech- 
niques that rely on our type structure to encode resource acquisition hierarchies; 
a generic barrier for n threads; and a Hoare style monitor with await /notify con- 
ditions, where our implementation of the condition’s process queue is supported 
by a dynamic linked data structure, as in real systems code. 
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Section 5 discusses related work. Section 6 offers concluding remarks and 
suggests further research. Complete definitions and detailed proofs to all results 
are provided in [65]. 


2 The Core Language and its Type System 


We present the core language, type system, and operational semantics of CLASS. 
The language is based on a PaT correspondence with Linear Logic, so terms of 
the language correspond to proof rules. We start by types and duality. 


Definition 1 (Types). Types A,B of CLASS are defined by 
A,B:=X |1 |L |A&B |AGBI|AGB|ASB 


[1A |?A |AX.A| |VX.A |X. A |vX.A 
[AA | VA |SeA [Sed [UA |UA 


Types in the first two rows correspond to Second-Order Classical Linear Logic, 
extended with inductive/coinductive types (u, v). Types comprise variables (X), 
units (1, L), multiplicatives (@, 9), additives (9, &), exponentials (!, ?) and 
quantifiers (J, V). The third row extends basic types with affine (A, V) and new 
modalities (Se, Ue, So, Uo) to type shared affine state. Duality is the involution 


operation A > A on types, corresponding to Linear Logic negation, defined by 


I =L A@®B=A¥B A@B=AaB 
[A =?B JX.A =VX.A uX. A = vX. {X/X}(A) 
KA =VA S.A =U.A S.A =UA 


Duality captures symmetry in process interaction, as manifest in the cut rule. 
In our system, typing judgements have the form P+, A; I’. The typing context 
A;TI is dyadic [4,15,63,22], where A is handled linearly and I is unrestricted; 
both A and T assign types to names. The index 7 is a finite map that holds 
coinduction hypothesis to type corecursive processes, as detailed later. 


Definition 2. The typing rules of CLASS are presented in Figs. 1 to 5. 


The type system corresponds, via propositions-as-types [22,27,80], to Second- 
Order Classical Linear Logic (Fig. 1) with inductive/coinductive types (Fig. 2), 
affinity (Fig. 3) and extended with constructs for shared mutable state (Figs. 4 
- 5). The basic composition rules are [Tmix] and [Tcut], which correspond to 
mix and cut of Linear Logic, respectively. [Tmix] types a parallel composition 
P || Q, where P and Q run in parallel without interfering. On the other hand, 
[Tcut] types linear interactive composition P |x : A| Q: processes P and Q 
run concurrently and communicate through a private linear session x, session 
endpoints being typed by dual types A/A. When the cut type annotation does 
not play any role, we may omit it and write P |x| Q. In examples, for readability, 
we use cut {P |x| Q} and par {P || Q} instead of P |x| Q and P || Q, respectively. 

For the basic process constructs [22,27,80,19], @/’S type send and receive, 
@/& type choice and offer (in examples we use labelled choice) , !/? type 
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[T0] Pr, ASD Qtr, AP 
PLO -, A, A;L 
1 ; š t Ae 
fwd x y Fyz: Ay: Ar os onii S a AT Teut] 
[TI] Q Fn ASL 
close Fy æ : 1; P wait z;Q Fy A,x: L;I 
Pi Fn Aye: A; I PoF Aye: B; I Te] 
case x {|inl: Pi, jinr: Po} Fy Aja: A&B LT 
Qi, A,x: AT Qo Fy A,x : BT 
z.inl; Qı a Wed x.inr; Q2 = Ac: AGB ST Pe 
Pi Fy Ar,y:A;P PF} 42,2: B; T Te] 
send x(y.P));P2 Fy 41, 42,2 : AQ B; r 
Q Fn 4,2: A,x: B; I [T] 
recv x(z); Q Fy A,x : AP B; r 
Phy y: AL Qi, A;P,x:A 


T!] [T?] 
la(y);P Fy æ lA; I ?x;Q Fy A,x :?A; T 


[Tmix] 


OF, 0; T 


[T] 


Praga Qh, 4&;T,£: A Qh, 4,2: A; T,z: A 
[Tcut!] [Tcall] 
y.P |z: A| Q Fy 4; r call x(z);Q Fy ASD: A 


PH, A,x :{B/X}A; r Q Fn A,x: ASL 


= [T3] [TV] 
sendty 2(B);P +, A,x:4X.A;P recvty 2(X);Q, 4, x : VX.ASLD 


Fig. 1: Typing Rules I: Second-Order CLL. 


Pry Azra r Y =n, X(z,w)=> 4,2: Y¥yP 
corec X (z, w); P [x,y] Fn {y/w}A, x : vY. A; {y/w} 
n=, X(z,y) => 4,x:Y;T 
X(z,w) Fy {w/y}4, 2: Y; {w/y yT 
PH, A, z: {uX. A/X}A; r Tu PH, A,x : {vX. A/XSA;T 
unfold, z; P Fy A,a: yX. A; T unfold, x; P Fy A,x : vX. A; T 


[Tcorec] 


[Tvar] 


[Ty] 


Fig. 2: Typing Rules II: Induction and Coinduction. 
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PFH, a: A,b:VB,c: UC; T 
affines e a; P F, a: ^A,b:VB,c: U,C; T 
[Tdiscard] Q Fn Aja: A; T 
use a; Q Fy Aya: VA; I 
Fig. 3: Typing Rules III: Affinity. 


[Taffine] 


discard a Fy a: VA; I [Tuse] 


replicated servers and their invocation, V/3 type receive and send of types, im- 
plementing polymorphic processes. 

Coinductive types are introduced by rule [Tcorec]. It types corecursive pro- 
cesses corec X (z, w); P [x, y], with parameters z, w bound in P, that are instan- 
tiated with the arguments x,y (free in the process term). By convention, the 
coinductive behaviour, of type vY. A, of a corecursive process is always offered 
in the first argument z. According to [Tcorec], to type the body P of a core- 
cursive process, the map 7) is extended with a coinductive hypothesis binding 
the process variable X to the typing context A,z:Y;I', so that when typing 
the body P of the corecursion we can appeal to X, which intuitively stands for 
P itself, and recover its typing invariant. Crucially, the type variable Y is free 
only in z : A. This causes corecursive calls to be always applied to names z’ that 
hereditarily descend from the initial corecursive argument z, a necessary con- 
dition for strong normalisation (Theorem 3), and morally corresponds to only 
allowing corecursive calls on “smaller” argument sessions (of inductive type). 

Rule [Tvar] types a corecursive call X (z, w) by looking up in 7 for the corre- 
sponding binding and renaming the parameters with the arguments of the call. 
Inductive and coinductive types are explicitly unfolded with [Ty] and [Tv]. 

To simplify the presentation in program examples, we omit explicit unfolding 
actions, and write inductive and coinductive type definitions with equations of 
the form rec A = f(A) and corec B = f(B) instead of A = uX. f(X) and 
B=vX. f(X), respectively. Similarly, we write corecursive process definitions 
as Q(x, y) = f(Q(—)) instead of Q(x, y) = corec X(z,w); f(X(—)) [z, y], while 
of course respecting the constraints imposed by typing rules [Tvar] and [Tcorec]. 


Affinity Affinity is important to model discardable linear resources, and plays 
an important role in CLASS. An affine session can either be used as a linear 
session or discarded. The typing rules for the affine modalities are in Fig. 3. 
Affine sessions are introduced by rule [Taffine] that promotes a linear a: A to 
an affine session a: AA. It types affines e a; P, which provides an affine session 
at a and continues as P, and follows the structure of a standard promotion rule. 

A session a may be promoted to affine if it only depends on resources that 
can be disposed, i.e. resources that satisfy some form of weakening capability, 
namely: coaffine sessions b; of type VB;, that can be discarded; full cell usages 
ci of type with U.C;, that can be released; and unrestricted sessions in I", which 
are implicitly ?-typed. The dependencies of an affine object on coaffine or full 
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oii [Tcell] release c+, c: Us A; T pice 
cell c(a.P) Fy A,e: S.A; LP n = 
[Tempty] Qn A,a: VA, c: UA; T 
take c(a); Q Fy A,c: Us A; T 


[Ttake] 


empty € Fy €: SoA; I 


Qı Fy 41,a : AÅ; Q2 Fn Aer Us A; T 
put c(a.Q1); Q2 Fy 41, 42,6 : UA; I 


[Tput] 


Fig. 4: Typing Rules IV: Reference Cells. 


cell objects are explicitly annotated as b,c in the process term, to instrument 
the operational semantics, but we often omit them and simply write affine a; P. 

The coaffine endpoint VA of an affine session, dual of AA, has two operations: 
use and discard. Rule [Tuse] types a process use a; Q that uses a coaffine session a 
and continues as Q, it is a dereliction rule. [Tdiscard] types the process discard a 
that discards a coaffine session a, it is a weakening rule. 


Shared Mutable State Shared state is introduced in CLASS by typed con- 
structs that model mutex memory cells, and associated cell operations allowing 
its use by client code, defined by the tying rules in Fig. 4. 

At any moment a cell may be either full or empty, akin to the MVars of 
Concurrent Haskell [45]. A full cell on c, written cell c(a.P), is typed SeA by 
rule [Tcell]. Such cell stores an affine session of type AA, implemented at a by 
P. All objects stored in cells are required to be affine, so that memory cells may 
always be safely disposed without causing memory leaks. An empty cell on c, of 
type S.A, and written empty c, is typed by rule [Tempty]. 

Client processes manipulate cells via take, put and release operations. These 
operations apply to names of cell usage types - Us A (full cell usage) and U, A 
(empty cell usage) - which are dual types of SeA and S.A, respectively. At any 
given moment, a client thread owning a U, A-typed usage to a cell may execute 
a take operation, typed by rule [Ttake]. The take operation take c(a);Q waits 
to acquire the cell mutex c, and reads its contents into parameter a, the linear 
(actually coaffine, of type VA) usage for the object stored in the cell; the cell 
becomes empty, and execution continues as Q. 

It is responsibility of the taking thread to put some value back in the empty 
cell, thus releasing the lock, causing the cell to transition to the full state. The put 
operation put c(a.Q1); Q2 is typed by [Tput], the stored object a, implemented 
by Q1, is required to be affine, as specified in the premise a: AÅ. 

Hence a cell flips from full to empty and back; [Ttake] uses the cell c at Us A 
type, and its continuation (in the premise) at U, A type, symmetrically [Tput] 
uses the cell c at U, A type, and its continuation (in the premise) at Us A type. 

The release c operation allows a thread to manifestly drop its cell usage c. 
Release is typed by |Trelease] (cf. coweakening [35]); a usage may only be released 


430 P. Rocha and L. Caires 


Pir, Ac: UA; T Oh, AeA 
share c {P || Q} Fy A’, A,e: UA; T 
PH, 4c: UA; T OreAye UA; T 
Q} Fy A A, c: UA; T 
PH, Ace: UA; T QF Aye: ULA; T 
share c {P || Q} Fea’, A, c: UA; T 
Fig. 5: Typing Rules V: State Sharing. 


wn 
=e 
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[TshR] 


in the unlocked state Us A. When, for some cell c, all the owning threads release 
their usages, which eventually happens in well-typed programs, the cell c gets 
disposed, and its (affine) contents safely discarded. 


Our memory cells cells are linear objects, with a linear mutable payload, 
which are never duplicated by reduction or conversion rules. However, in CLASS, 
multiple cell usages may be shared between concurrent threads, which compete 
to take and use it in interleaved critical sections. Such aliased usages be passed 
around and duplicated dynamically, changing the sharing topology at runtime. 


Sharing of cell usages is logically expressed in our system by the typing rules 
in Fig. 5. Co-contraction, introduced in Differential Linear Logic DiLL [35], al- 
lows finite multisets of linear resources to safely interact in cut-reduction, resolv- 
ing concurrent sharing into nondeterminism, as required here to soundly model 
memory cells and their linear concurrent usages. Rule [Tsh] interprets cocon- 
traction with the construct share c {P || Q}, and types sharing of the cell usage 
c: Us A between the concurrent threads P and Q. 


Contrary to cut, share c {P || Q} is not a binding operator for c. The shared 
usage c : U,A is free in the conclusion of the typing rule, permitting c to be 
shared among an arbitrary number of threads, by nested iterated use of [Tsh]. 
In [Tsh], P and Q only share the single mutex cell c, since the linear context is 
split multiplicatively, just like [Tcut] wrt. binary sessions. This condition comes 
from the DiLL typing discipline, and is important to ensure deadlock freedom. 


While [Tsh] types sharing of a full (unlocked) cell usage of type U.A, the 
symmetric rules [TshR] and [TshR] type sharing of an empty (locked) cell usage 
of type Us A. We may verify that for every cell c in a well-typed process, at 
most one unguarded operation to c may be using type U.A, all the remaining 
unguarded operations to c must be using type U, A. This implies that, at runtime, 
only one thread may own the lock for a given (necessarily empty) cell, and 
execute a put to it, which will bring the cell back to full and release its lock, 
other threads must be either attempting to take, or release the reference. 


Working together, the sharing typing rules ensure that in any well-typed cell 


sharing tree, at most one single thread at any time may be actively using a cell 
(in the locked empty state) and put to it, thus guaranteeing mutual exclusion, 


Safe Session-Based Concurrency with Shared Linear State 431 


while satisfying Progress (Theorem 2) which in turn ensures deadlock absence, 
even in the presence of the crucially blocking behaviour of the take operation. 


2.1 Operational Semantics 


We now define CLASS operational semantics, which is given by a structural 
precongruence relation < that captures static relations on processes, essentially 
rearranging them, and a reduction relation — that captures process interaction. 


Definition 3 (P = Q and P < Q). Structural congruence = is the least congru- 
ence on processes closed under a-conversion and the =-rules in Fig. 6. Structural 
precongruence < is the least precongruence on processes including = and closed 
under a-conversion and the <-rules in Fig. 6. 


The basic rules of = essentially reflect the expected static laws, along the lines 
of the structural congruences / conversions in [22,80]. The binary operators for- 
warder, cut and share are commutative ([comm]). The set of processes modulo 
= is acommutative monoid with binary operation given by parallel composition 
and identity given by inaction 0 ([par]). Any two static constructs commute, 
as expressed by the laws [CM]-[ShC!]. Furthermore, we can distribute the unre- 
stricted cut over all the static constructs as expressed by law [D-C!X], where « 
stands for either a mix, linear or unrestricted cut or a share. 

The commuting conversions [ShTake] and [ShPut] allows take and put op- 
erations on cell usages to commute with a share construct. Rule [ShTake] picks 
the take that occurs on the left argument, however since share is commuta- 
tive, a right-biased version of [ShTake] is admissible. Using [ShTake], any of the 
two possible interleavings for two concurrent takes may be nondeterministically 
picked via <. Indeed, we express < as a precongruence because it introduces non- 
determinism, and does not express a behavioural equivalence as = does. N.B.: 
Although one could easily formulate a confluent version of CLASS semantics, 
using explicit sums as in [13,66,35,65], we prefer in this paper to focus on the 
expressiveness of CLASS as a programming language and on its deadlock and 
livelock absence properties, adopting a nondeterministic reduction relation. 

In [ShPut] only a put, in the Us A-typed premise of [TshL], may be propagated 
up and eventually update the cell, causing it to transit back to the full state. 
Hence, take operations originating the U.A typed premise of [TshR] will be 
blocked, waiting until such (unique) put propagation occurs. Algebraically, rule 
[ShRel] expresses that the release operation is the identity for share composition, 
we orient it as a precongruence, to ensure type preservation. 


Definition 4 (Reduction —). Reduction — is defined by the rules of Fig. 7. 


We let Ž stand for the reflexive-transitive closure of +. Reduction includes 
the set of principal cut conversions, i.e. the redexes for each pair of interacting 
constructs. It is closed by structural precongruence ([<]) and in rule [cong] we 
consider that C is a static context, i.e. a process context in which the hole is 
covered only by the static constructs mix, cut and share. 
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fwd zy =fwdys P |z| Q =Q |z| P 


share x {P || Q} = share x {Q || P} comm] 
P\\0 =P P\|Q=Q\|P PII(QIR)=(PIQ)IIR [par] 
P |x| (Q || R) = (P |z| Q) || R CM] 
P |x| (Q |y| R) = (P |z| Q) |y| R CC] 
P |x| share y {Q || R} = share y {P |z| Q || R} CSh] 
P |z| (y.Q |!a| R) = y.Q |!z| (P |z| R) CC!] 
y.Q |!2| (P || R)=P || (y.Q |z| R) CM 
y.P |!x : A| (w.Q |lz: B| R) = w.Q |!z: B| (y.P |!x: A| R) C!C!] 
share x {P || (Q || R)} = share x {P || Q} || R ShM] 
share x {P || share y {Q || R}} = share y {share x {P || Q} || R} [ShSh] 
share z {P || y.Q |!z| R} = y.Q |!z| share z {P || R} ShC!] 
y.P |!x: A| (Q * R) = (y.P |!x : A| Q) * (y.P |!a: A| R) D-C!X] 
share x {release x || P} < P ShRel] 
share x {put z(y.P);Q || R} < put x(y.P);share x {Q || R} ShPut] 
share x {take x(yi); Pi || take x(y2); Po} 

< take x(y1);share x {P, || take (yo); P2} ShTake] 


Provisos: in [CM] and [ShM], z € fn(Q); in [CC], [CSh] and [ShSh], x,y € fn(Q); in 
[CC], [C!M] and [ShC!], x ¢ fn(P); in [C!C!], x ¢ fn(Q) and z ¢ fn(P). 


Fig. 6: Structural congruence P = Q and precongruence P < Q. 


Operationally, the forwarding behaviour is implemented by name substitu- 
tion [23] ([fwd]). All the other conversions apply to a principal cut between two 
dual actions. Reduction rules for the basic session constructs that interpret Sec- 
ond Order Linear Logic and recursion are the expected ones [22,27,81], along 
predictable lines. For readability, we omit the type declarations in the cuts, as 
they do not actually play any role in reduction. 

We comment the rules concerning affinity. The interaction between an affine 
session and an use operation is defined by reduction rule [AVu], where a cut on 
a: NA between affines, e a; P and use a; Q reduces to a cut on a: A between the 
continuations P and Q. The reduction between an affine session and a discard 
operation is defined by [Avd]. A cut between affines, a; P and discard a reduces 
to a mix-composition of discards (for the coaffine sessions b) and releases (for 
the cell usages c) cf. [6,20]). In the corner case where c and a are empty, the 
left-hand side of [Avd] simply degenerates to inaction 0 (the identity of mix). 

The reductions for the mutable state operations are fairly self-explanatory. In 
rule [S.U.r], a cut between a full mutex cell cell and a release operation reduces 
to a process that discards the affine cell contents, cf. rule [Avd]. In rule [Se Uet], a 
cut on c: SeA between a full cell and a take operation reduces to a process with 
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fwd z y |y| P > {a/y}P fwd] 
close x |x| wait z; P + P 11] 
send x(y.P);@ |x| recv x(z);R > Q |z| (P |y| {y/2}R) [we] 
case x {|inl: P, linr: Q} |x| z.inl; R> P |x| R &Oi] 
case x {|inl: P, linr : Q} |x| x.inr; R > Q |z| R &Dr] 
la(y); P |z| ?2;Q > y.P |!z| Q 1?] 
y.P |!x| call (z); Q > {z/y}P |z| (y.P |!z| Q) call] 
sendty 2(A);P |z| recvty (X); Q > P |z| {A/X}Q Jy] 
unfold, z; P |x| unfold, z; Q > P |z| Q uv] 
unfold, z; P |x| corec Y (z, w); Q [x,y] 

> P |x| {x/z}{y/w}{corec Y (z, w); Q/Y}Q corec] 
affines c a; P |a| use a; Q > P |a| Q AVul 
affines, c a; P |a| discard a — discard b || release e Avd] 
cell c(a.P) |c| release c > P |a| discard a SeUsr] 
cell c(a.P) |c| take c(a’);Q > P |a| (empty c |c| {a/a’}Q) [Se Uet] 
empty c |c| put c(a.P);Q > cell c(a.P) |c| Q SeU] 
P< P' ad P> Q andQ’<Q5D PQ [<] 
P+Q D C[P] > cR] cong] 


Fig. 7: Reduction P > Q. 


two cuts, both composed with the continuation {a/a’}Q of the take. The outer 
cut on a : AA composes with the stored affine session, which was successfully 
acquired by the take operation. The inner cut on c : S,A composes with the 
reference cell c, which has became empty in the reductum. Finally, in rule [SoUo], 
a cut on session c : SoA between an empty cell and a put operation reduces to 
a cut on session c : SeA between a full cell, that now stores the session that was 
put, and the continuation of the put process. Notice that the locking/unlocking 
behaviour of cells is simply modelled by rewriting of the process terms, from cell 
to empty and back, as typical in process calculi. 


3 Type Safety and Strong Normalisation 


In this section we state and give proof sketches for our main results of type safety 
and strong normalisation. Full proofs may be found in [65]. 


Type Preservation The semantics of CLASS is defined by a set of precongru- 
ence < and reduction — rules on process terms. Theorem 1 shows that these 
relations preserve typing, and gives substance to our PaT approach, showing that 
every < and —> rule corresponds to a conversion on type derivations/proofs. 
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Theorem 1 (Type Preservation). Suppose P+, A;r. (1) If P < Q, then 
Q Fy 4; r. (2) If P > Q, then Qt, A:I. 


Proof. By induction on derivations for P < Q (resp. P > Q), we verify that all 
the rules of < (Def. 3) (resp. — (Def. 4)) are type preserving. 


Progress We prove the progress property for well-typed CLASS processes. The 
following notion of live process becomes useful. A process P is live if and only 
if P = C[Q], for some static context C (the hole lies within the scope of static 
constructs mix, cut and share) and Q is an active process (a process with a 
topmost action prefix, such as a receive or a take, or a forwarder). We first 
show that a live well-typed process either reduces or offers an interaction with 
its environment on a free name. The following observability predicate (cf. [70]) 
characterises the interactions of a process with its environment 


Definition 5 (P |,). The predicate P |, is defined by rules of Fig. 8. 


The predicate P |, holds if P offers an immediate interaction (unguarded action) 
on free name x. We can observe the subject of an action (rule [act]) and x,y 
of a forwarder fwd x y. The definition of P |, is closed by < and propagates 
observations over the various static operators. Cut bound names are not free, 
hence cannot be observed. Share share y {P || Q} propagates all the observations 
x for which x 4 y and by applying < rules [ShTake], [ShRel] or [ShPut] via [<], 
an interaction on x may be observed. We have 


Lemma 1 (Liveness). Let Pg A; I be live. Either P |. or P reduces. 


Proof. (Sketch) By induction on a derivation for P fg A;I, along the lines 
of [27]. To handle case [Tcut] P = P, |y| P2: both P; and P, are live, since both 
type with a nonempty linear typing context, hence we can apply the induction 
hypothesis (i.h.) to both premises of [Tcut]: either (i) one of P, and P> reduces 
or (ii) both P; Jz, and Pz |,.. If (i), then P reduces. Case (ii) follows because, 
crucially, P} and Pz synchronise through a single private session y, then either 
zı Æ y or £2 Æ y, in which case we can observe either x1 or £2; or £1 = T2 = Y, 
in which case we can trigger a reduction, by applying < rules to P in order to 
exhibit a principal cut. For case [Tsh] P = share y {P, || P2}: since P; and P> 
are live, we apply i.h. to both premises. The interesting case occurs when P, Jz, 
and P> |... Co-contraction implies that P, and P> share the single usage y, so 
if xı A y or 22 # y, we have either Pı Jæ, or Py Jz. If both zı = re = y, 
then we derive P |,: the observation corresponds to either a take or a release 
operation on y, which we commute up with [ShTake] or [ShRel]. For [TshL] 
P = share y {P, || P2}, we apply the i-h. to the premise P,, which types with 
an empty usage on y. If P) Jy, then P |,,, the observation corresponding a put 
operation on y, which we commute up with [ShPut]. Symmetrically for [TshR]. 


Theorem 2 (Progress). Let P Hg @;@ be a live process. Then, P reduces. 


Proof. Follows from Lemma 1 since fn(P) = 0. 
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fwd x y Le Aa, Pls [<] PIO OL [mix] 
cut] cut! [share] 
(P [y| Q) Le (2-P |!y| Q) Le (share y {P || Q}) Le 


Fig. 8: Observability Predicate P |... 


Remarkably, our proof of Theorem 2 leverages deep properties of Linear Logic, 
in particular the structure of the linear cut and co-contraction, allowing us to 
prove deadlock absence, even in a language with primitives exhibiting blocking 
behaviour, avoiding the use of extra mechanisms [47,33,48,10,25,76,31]. 


Strong Normalisation Establishing strong normalisation (SN) for concur- 
rent process calculi is usually fairly challenging, particularly in the presence 
of name passing, recursion and higher-order shared state [32,16,83,49,69]. For 
example, with reference cells one may express general recursion with Landin’s 
knot, and, in general, circular chains of references that may lead to divergence. 
However, our linear type system uses primitive recursion and corecursion, and 
excludes cyclic dependencies through state or session based interaction, allowing 
strong normalisation, and therefore livelock absence, to hold. Our proof relies 
on defining suitable linear logical relations, cf. [62,21,72], adapted to Classical 
Linear Logic [38,1,8], and crucially relying on a notion of reducibility up to in- 
terference that imposes stronger properties on the interpretation of the state 
modalities, and which allows the inductive proof of the Fundamental Lemma 2 
to go through in the usual way. To this end, we extend our basic language with 
auxiliary constructs cell c(a.S) and empty c(a.S'), which denote memory cells 
subject to interference from concurrent writers, allowed to take terms from the 
set S C {P| PH, a: AA}. The intuition is that a take on the cell may always 
read any object from 5S, due to interference. We also consider the additional 
reduction (nondeterministic) rules (1)-(3), where in 1 and 2 we assume P € S. 


cell c(a.S) |c| release c > P |a| discard a, (1) 
cell c(a.S) |c| take c(a’); Q — empty c(a.S) |c| (P |a| {a/a’}Q) (2) 
empty c(a.S) |c| put c(a.P); Q > cell c(a.S) |c| Q 3 


In this section, we thus consider reduction of P > Q to be the relation defined 
in Fig 7, extended with these rules. When a take or a release interacts with 
cell c(a.S'), an arbitrary element P from the set S may be picked (rules (1) and 
(2)). In (3), a put put c(a.P); Q interacts with empty c(a.S) causing empty c(a.5S') 
to evolve to cell c(a.S) (3). The following notion is also useful. A process P is 
S-preserving on x if PF, z: U.A or PF, x: U.A, and 


— if P Sx take z(y); P’ and Q € S, then Q |y| P’ is S-preserving on 7. 
— if P >= put x(y.P,); P2, then P, € S and Py is S-preserving on z. 
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A set of processes T is S-preserving on x if and only for all P € T, P is S- 
preserving on x. Intuitively a process P that uses a cell x is S-preserving on x 
if it only puts values from § on cell x. The notion of S-preservation, parametric 
on any S, brings explicit the conditions needed for safe interaction with a mem- 
ory cell, subject to interference, while ensuring a state invariant S on the cell 
contents. We now introduce the logical predicate. 


Definition 6 (Logical Predicate [x : A],). By induction on the type A, we 
define the sets [x : A] an shown in Fig. 9, such that |x : Us Alo and [x : Us Ale 
are [— : AA]-preserving on x.The definition is direct for the positive types A, 
for negative types B is given by orthogonality. 


A 


The definition relies on Girard’s notion of orthogonality S+ = {P | VQ € 
S. P |x| Q is SN} [37]. Duality promotes succinctness in our definition: for neg- 
ative types A, [a : A], is defined as the orthogonal of the predicate for its dual 
A (positive) type. To handle polymorphic and inductive types, the logical pred- 
icate is indexed by a map ø that assigns reducibility candidates R[x : A] to type 
variables. A reducibility candidate R[x : A] is any set 5 of processes P Fg x: A 
such that P is SN and S = S++. We let R|- : A] be the set of all reducibil- 
ity candidates R[x : A] for some name wz. The definition relies on a congruence 
relation % extending < with a complete set of commuting conversions, along 
standard lines [22,27,80]. It essentially plays the role of the labelled transition 
system in the proof of strong normalisation given in [62]. 

We extend the logical predicate to typing judgements P F} A; I” by universal 
closure over the typing context and ø. 


Definition 7 (Extended Logical Predicate £[},, A;I'].). We define LIF, 
A; T]o inductively on A,I and ņ as the set of processes P+, A; P s.t. 


P € Lio 0:0] iff P is SN. 

PeéeLlty Ayr: A; Io iff VQ € [z : Ajo. Q |z: A| P € Llp 4; T]o. 

P € L]Fo 4; T, x : Alo if VQ € [y: Alo. y-Q |!z : A| P € Lig 4; T]o. 
Pe LIF}, X (yj A ar A; T]o if VQ € a(Y). {Q/X}P € LIF, A; T]o. 


We now state the Fundamental Lemma (2) from which Theorem 3 follows. 
Lemma 2 (Fundamental Lemma). If P+, 4; T, then P € LIF, A; T]o. 


Proof. (Sketch) By induction on P F} A;I. For cases [Tcell] and [Tempty], we 
show that cell c(a.S) and empty c(a.S) respectively simulate cell c(a.P) (where 
P € S) and empty c, when composed with any S-preserving on c usages. To 
handle one of the most challenging cases, [Tsh] we prove, for all S, and all S- 
preserving on x processes P) and Pz, that cell c(a.S) |c| share c {P || Po} (1) 
is simulated by (cell c(a.S') |c| Pi) || (cell c(a.S) |e| P2) (2). This allows us to 
infer that if (2) is SN, then so it is (1). When S = [a : AA], the ih. yields 
(cell c(a.S) |c| P;) SN, hence we conclude (2) SN. Similarly for [TshL}, [TshR]. 


Theorem 3 (Strong Normalisation). If P Hø @;0, then P is SN. 
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x: Xo £ o(X)[z] 
x: ijo 4 {P| P x close x and P is SN}1+ 
x: AQ Bļo £ {P | IP}, P2. P ~ send x(y.P,); Po and 
P, € fy: A]. and P» € Iz: B]e }++ 
x: A9 Bļ]o £ {P | IQ. P x z.inl;Q and Q€ [z:Aļo or 
P x x.inr; Q and Q € [z: B]e}++ 
Q. P x !z(y);Q and Q € [y : A]Je}++ 
Q, S € R|- : B]. P ~ sendty z(B); Q and 
Qeé lz: A]o[x4s]}¢* 


x Ajo £{P 
grax Aly {P 


z: uX. A], = (N{S € R|- : uX.A] | unfold, z; fæ : AJopxos) E SHH 
z:AA], {P | JQ. P x affine z;Q and Q € [æ : A],}t+ 

z:S Ale ={P| Px cell (y.y : ^A]o) and P is SN}++ 

z:S.Alo £ {P | P ~ empty z(y.[y: ^Aļ]o) and P is SN}++ 

x: Ble £ jx: B] (B negative type) 


Fig. 9: Logical Predicate [x : Aļo. 


4 Typeful Concurrent Programming in CLASS 


In this section, we discuss the expressiveness of CLASS’s type system, going 
through a sequence of illustrative realistic concurrent programming idioms. 


Sharing a Linear Session. Our first example illustrates how objects subject 
to a linear usage protocol and satisfying an invariant may be shared among 
multiple concurrent clients by serialising linear usages using a mutex cell, al- 
ternating ownership from the cell to clients and back at the invariant state, a 
commonly used discipline to implement and reason about resource sharing (see, 
e.g., [39,17,9]). We illustrate with a basic toggle switch with two states - On and 
Off - the resource invariant is the state Off, and two operations #turnOn and 
#turnOff that must be executed in strict linear sequence (Fig. 10). The toggle 
protocol, defined by type Off, offers the single option #turnOn, after which it 
evolves to On. Conversely, type On offers the single option #turnOff, after which 
it evolves to an affine Off. The toggle process at t is defined by two mutually 
corecursive processes on(t) and off(t), which define the expected behaviour, and 
comply with types On and Off. 

Process main() introduces a mutex cell c storing an affine toggle object at the 
invariant type AOff. It then shares it with two concurrent clients, each acquires 
the toggle in the invariant type and uses the linear protocol independently. After 
their linear interaction, they put back the toggle, the type system ensures that 
this can only happen when the invariant (given by the cell type) holds. When 
they are done, both clients release their respective usages of c, which ultimately 
leads to the cell being deallocated and the (affine) toggle to be discarded. 
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type corec Off = &{|#turnOn : On} client2(c) F c: S,Off 
type corec On = &{|#turnOff : AOff} — client2(c) = take c(t); 


off(t) F t: Off #turnOn t; #turnOff t; 
off(t) = case t {|#turnOn : on(t)} #turnOn t; #turnOff t; 
put c(t); release c 
on(t) F t: On 
on(t) = case t {|#turnOff : main) FØ 
affine t; off (¢)} main() = cut {cell c(t.affine t; off(t)) 
client1(c) F c: SeOff \c| 
client1(c) = take c(t); share c { 
#turnOn t; #turnOff t; client1(c) || 
put c(t); release c client2(c) }} 
Fig. 10: Sharing a Linear Toggle Switch 
type rec SList(A) = S,List(A) append(c, I’, c) = 
type rec List(A) = @{ take e(l); 
|#Null : 1, case | { 
|#Next : AA @ SList(A)}  [#Null : 
nil(2) FL: AList(A) | wait l; put c(i’); fwd ce’ 
a eae en : ##Next : 
nil(Z) = affine l; #Null J; close | res Wai 


cnext(a,c,l) F a: V A, c:SList(A), l: A List(A) cut { 


cnext(a, c, l) = affine l; append(I, 1’, x) 
#Next l; |z| 
send l(a); put c(y.cnext(a, x, y)); 
fwd l c fwd cc! }} 


Fig.11: A Linked List with an Append In-Place Operation. 


We have also developed CLASS code for a generic (polymorphic) wrapper 
factory that, for any affine corecursive protocol, generates a wrapper to a general 
invariant-based sharing interface. 


Linked Lists, Update In-Place. In this example, we show how inductive/- 
coinductive types combine harmoniously with CLASS state modalities to type 
linked data structures with memory-efficient updates in-place. Specifically, we 
show how to code a linked list, parametric on the type A of its affine values, 
with update in-place append (Fig. 11). An object of type SList(A) is a (full) cell 
storing a List(A) object. An object of type List(A) is a session that either selects 
#Null (the list is empty), in which case it closes; or selects ##Next, in which case 
it sends an affine session AA representing the head element and continues as the 
tail SList(A). Process nil(1) - defines an empty list at l - and process cnext(a, c, l) 
- constructs a nonempty list l with head a and tail c. For example, a list with 


Safe Session-Based Concurrency with Shared Linear State 439 


elements a,b stored at c1 : S,List(A) is represented 


cut{ cell cy(1y.cnext(a, c2, l1)) |c2| cell c2(l2.cnext(b, cs, 12)) |es| cell cs(lo-nil(lo))} 


Process append(c,l’,c’) H e: SList(A),l’ : List(A),c’ : SList(A) produces on c 
the result of appending l (in place) to c. It takes the list | stored in c, and then 
performs case analysis on l. If | selects #Null, it simply replaces the previous null 
node of c by I’ and forwards the updated cell c to the output c’. This corresponds 
to the recursion base case in which the list l is empty. 

If l selects #:Next, in which case l has at least one element, one receives at l 
the node element a: VA, and corecursively call append I’ to the tail | : SList(A) 
and puts back in c element a and tail x “returned” by the call. Notice that 
x is exactly x (by forwarding), which was passed along linearly. Remarkably, 
the append(c,l’,c’) operation just defined may be safely applied concurrently 
to the same shared linked list, with the final result being the correct one (some 
serialisation of the appends), without deadlocks or livelocks. It is also interesting 
to see how the type system forbids a list to be appended to itself. 

We have also developed many other in-place operations on linked data struc- 
tures, such as insertion sort, and other kinds of linked structures such as queues 
and binary search trees. In the next examples we discuss a shared queue ADT 
with a fine-grained locking discipline and O(1) enqueue and dequeue operations. 


A Concurrent Shareable Buffered Channel. We illustrate increased de- 
grees of sharing in a mutable data structure with various references pointing to 
different parts of it, how the CLASS type system may express interfaces that 
talk about different client views for using a stateful object, and the use of poly- 
morphism to implement information hiding ensuring that client code will never 
break the representation invariants of stateful ADTs, particularly challenging 
when aliasing and sharing are involved. 

More concretely, we consider a shareable buffered channel (Fig. 12), and 
provide a realistic and efficient implementation [56] based on a message queue 
represented by a linked list with update-in-place (cf. Section 4 above) and two 
independent pointers: one to the head of the list, used for receiving, and another 
to the tail, used for sending. The operations are executed in O(1) time. Moreover 
we provide a typing with two separate send and receive views, which may be 
used by an arbitrary number of concurrent clients. In particular, when the list 
is nonempty, both send and receive run in true concurrency (asynchronously), 
without blocking each other, thanks to fine-grained locking. 

The buffered channel type BChan(M), where M is the type of messages, 
offers two views: SendT(M) and RecvT (M), interfaces for sender and receiver 
endpoint clients. These views are exposed with a par (9), since they share an 
underlying resourceful structure. In fact, they could not be exported using a ten- 
sor (Q); it is interesting to notice how the type system imposes these constraints, 
important to ensure deadlock freedom. The representation type of both views is 
Rep = S,SList(M) (see Section 4), hidden behind the SV and RV existential 
types [29,58]; sending clients use a cell storing a reference to the tail node of 
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type BChan(M) = SendT(M) 9 RecvT (M) msend(me) = 
type SendT(M) = ASV.!MenuS(M, SV) @ SV recv me(tailptr); 
type RecvT (M) = 3RV.!MenuR(M,RV)@ RV  recv me(a); 

take tailptr(c); 


type MenuS(M, SV) = & { take c(l); 
#Send: SV — AM — SV, cut { 
#Share : SV — (SV 8 SV), cell c’(1) 
#Free: SV — 1 }, \c’| 
share c’ { 
type MenuR(M, RV) = & { put c(I’.cnext(a, c’,1’)); 
#Recv : RV — (Maybe(AM) @ RV), release c’ 
#Share: RV — (RV 9 RV), | 
#Free: RV — 1 } put tailptr(c'); 
send me(tailptr); 
Rep = SV = RV =S,SList(M) close me} } 


Fig. 12: A Concurrent Shareable Buffered Channel. 


the queue; receiving clients use a cell storing a reference to the head node of the 
queue. 

Clients use the buffer through references of abstract type SV and RV and 
replicated menus !MenuS(M, SV) and !MenuR(M, RV). Both menus export the 
options #Share and #Free to allow sharing and release of the views. To send, a 
client selects #Send, sends his handle (of opaque type SV), the message to send 
and receives the (linear) handle back. In this implementation, receive is non- 
blocking, so operation #Recv returns a Maybe(AM) value: the client receives 
either #Nothing (if the buffer is empty) or #Just followed by a message a, oth- 
erwise. In 4 we discuss the implementation, in CLASS, of (Hoare style) monitors 
with conditions, which would allow a blocking receive to be implemented. 

Process msend(me) implements the #Send “method”. It first receives the 
sending view handle (of concrete type Rep), which is a cell with the tailptr, and 
the message a to be sent. Then, a new cell c’ with nil (l) is created, the current 
tail of the list c is updated with a new node storing a and pointing to c’. Finally, 
the tailptr cell is updated to point to the new tail node c’ of the linked list. 


Dining Philosophers. A resource hierarchy solution for the dining philoso- 
phers problem [34] requires forks to be acquired in a defined order. We “encode” 
such order in CLASS with an explicit (necessarily) acyclic structure, which in- 
forms the type system about the code safety. This allows us to define a correct 
implementation that satisfies deadlock freedom by pure linear logic typing. More 
concretely, we organise the forks in a linked chain defined by the inductive types 
rec Fork = S.Node and rec Node = @{#Null : 1, #Next : Fork}. 

Any fork in the chain may be shared by an arbitrary number of philosophers, 
cocontraction ensures that philosophers cannot communicate between them- 
selves via any other channel, all synchronisation must happen via the chained 
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putNull(f, f’) + f : U.Node, f’ : Fork eat2(f, f’) + f : Fork, f’ : Fork 
putNull(f, f) £ put f(n-null(n)); fwd f f' eat2(f, f’) = 
take f(n); 
eat( f, f’) + f : Fork, f’ : Fork case n { 
eat(f, f’) = |#Null : 
take f(n); wait n; putNull(f, f’) 
case n { |#Next : 
|J#Null : cut { 
wait n; putNull(f, f’) takeLast(n, x) 
|#Next : |z| 
take n(m); recv x(m); wait 2; 
put n(m); put f(n’.next(n,n’)); put f(n’.next(m,n’)); 
fwd ff} fwd ff} 


Fig. 13: The Dining Philosophers. 


forks. Furthermore, the chain can be resized and grow unboundedly to accommo- 
date an arbitrary number of philosophers. If a philosopher successfully takes a 
fork fi, he can then take any fork fj, with i < j; crucially, he must follow the path 
dictated by the chain, hence cannot acquire forks f; with j < i. In Fig. 13 we 
define the eat operation, which allows each philosopher P;, with 0 < i < k— 1 to 
eat: it acquires two consecutive forks in the chain. And eat2, which is the specific 
eating operation for the symmetry breaker P,_1: it acquires the first fork, and 
traverses the chain to acquire the last with takeLast(n, x) + n : Fork, x : Fork@ 1. 


A Barrier for N threads. We describe in Fig. 14 a CLASS implementation 
of a simple barrier, parametric on the number N of threads to synchronise. We 
find it interesting to model the “real” code shown in the Rust reference page for 
std::sync::Mutex [46]. The code uses if-then-else and primitive integers, as offered 
in our implementation, that could be defined as idioms in CLASS. We represent 
a barrier by a mutex cell storing a pair consisting of an integer n, holding the 
number of threads that have not yet reached the barrier, and a stack s of waiting 
threads, each represented by a session of affine type AL (so they will be safely 
aborted if at least one thread fails to reach the barrier). 


The type Barrier of the barrier is S,BState, where BState = Int @ AList(AL). 
Initially the barrier is initialised with n = N threads and an empty stack, so that 
the invariant n+depth(s) = N holds during execution. Each thread(c; i) acquires 
the barrier c and checks if it is the last thread to reach the barrier (if n == 1): in 
this case, it awakes all the waiting threads (awakeAll(w,)) and resets the barrier. 
Otherwise, it updates the barrier by decrementing n and pushing its continuation 
into the stack (the continuation for thread i just prints “finished” ). The following 
process main() + Ø creates a new barrier c and spawns N threads, each labelled 
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init(w;) F ws : ABState 
init(w,) = 
affine ws; send ws(N); affine ws; nil(ws) 
awakeAll(w, : List(AL)) 
awakeAll(w,) £ 
case Ws { 
#Nil : wait w,;0 
#Cons : 
recv ws(w); 
par {close w || awakeAll(w,) } 


spawnAll(c; i, n) + c: Barrier; i : Int, : Int 
; A 
spawnAll(c; i, n) = 
if (n == 0) { release c} 
{ share c { 


thread (c; i) 
| 


spawnall(c;2 + 1,n — 1)}} 


thread(c; i) + c : Barrier; i : Int 
thread(c; i) = 
printin ¿ + “: waiting.”; 
take c(wgs); recv ws(n); 
if (n==1) { 
par { 
printin ¿ + “: finished.” ; 
awakeAll(w,) 
| 
put c(wy.init(w%)); 
release c}} 
{ cut { 
affine w; wait w; 
printIn 2 + “: finished.” ; 0 
|w| put c(w.affine wi; 
send wi (n — 1); 
affine wy; 
cons(w, Ws, w, )); 
release c}} 


Fig. 14: A Barrier for N Threads 


by a unique id i: main() + cut { cell c(wg.init(ws)) |c| spawnAll(c; 0, N) }. Again, 
our type system statically ensures that the code does not deadlock or livelock. 


A Hoare Style Monitor. A Hoare style monitor is a well-know powerful 
programming abstraction [39], allowing concurrent operations on shared data to 
be coordinated in a sound way, so that it always satisfy a correctness invariant. 
The key essential idea is that concurrent client threads use the monitor lock to 
access the protected state in mutual exclusion, but may also wait (via a await 
primitive) inside the monitor until the state satisfies specific (pre-)conditions, 
while transferring state ownership to other threads potentially responsible for 
establishing such conditions and announcing it (via a notify primitive). 

We discuss a CLASS implementation of a monitor, sketching the main com- 
ponents and how they are typed (Fig. 15). We consider a counter with value n, 
with increment #Inc and decrement #Dec operations, and subject to the invari- 
ant n > 0. The type of the counter Counterl exposes two separate, coinductively 
defined, client interfaces Decl and Incl for decrementing and incrementing. 

While the #Inc operation is synchronous, the #Dec operation is always called 
asynchronously by passing a continuation (of type ContDec). This allows decre- 
menters to wait inside the monitor for condition NZ (n > 0) when n = 0. The 
condition NZ is represented by a wait queue of type WaitQ. The representation 
type of the monitor (Rep) holds the counter value and the wait queue. Each node 
in the wait queue stores information, of type ContDecW, for the waiting thread. 
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type corec Incl £ &{|#Inc : Incl, |#End : L} awaitNZ(m,n, w, cc) = 
type corec Decl £ put m(w’.affine v; 

V & {|#Dec : V(ContDec — L), #End : L} send w'(n); 
type corec ContDec = V(Decl & 1) consWQ(ce, w, w’)); 
type Counterl £ Decl ’9 Incl release m 
type rec Rep £ (!Int) @ WaitQ incloop(iv, m) = 


type rec WaitQ = A@ {|#Null : 1, |#Next : NodeQ} nae ne; 
type rec NodeQ £ S,(ContDecW & WaitQ) ` , 


rec : 
type rec ContDecW 4 A(ARep — ARep @ Decl — 1) TE v r(n); 


, CE send s(n + 1); fwd s r 
awaitNZ F m : U Rep, |s| notifyNZ(m, s,m’) 
n: Int, w : WaitQ, cc : ContDecW 'm’| incloop(iv,m’) } 
notifyNZ F m : UoRep, s : Rep, m’ : S.Rep #End : wait iv; 
incloop F iv : Incl, m : Ue Rep release m} 


Fig. 15: Implementing a Counter Monitor with Await / Notify. 


Every such ContDecW objects stores (1) the pending action on the internal mon- 
itor state (of type ARep — ARep), to be executed after await returns, and (2) a 
callback to the continuation provided by the external client in the asynchronous 
call (of type Decl — L). 

The awaitNZ(m,n,w,cc) process implements the monitor wait operation, 
used in the #Dec operation. It receives the (empty) cell usage m to the mon- 
itor state, the integer value n (where n = 0), a reference w to the wait queue, 
and the continuation cc, it pushes a new node in the queue and puts the moni- 
tor state back, unlocking the cell m, and releases m. The incloop(iv,m) process 
implements the counter Incl interface. The call to notifyNZ(m, s,m’) after incre- 
menting n will cause a waiting Decl thread to be awaken (if any), and continue 
by applying the pending action to the Rep state s in which n > 0 holds, before 
passing the updated state m’ to the incloop recursive call. Affinity plays a key 
role, allowing all data structures, including waiting continuations to be safely 
discarded, at the end of any computation. We have only shown here some code 
snippets, the complete code is available in the CLASS distribution. 

Our examples illustrate how our system types non-trivial concurrent code, 
akin to real system-level code, involving higher-order state, rich sharing and own- 
ership transfer patterns, while ensuring deadlock, livelock freedom and memory 
safety. Our typing of sharing imposes that only a single bundle of linear resources 
may be shared by two independent threads. As our examples show, code can of- 
ten be structured in that way, so that bundles of many linear resources may be 
safely shared by monitor-like structures, exposing informative typed interfaces. 

The feasibility of CLASS is corroborated by our implementation [68] of a fully- 
fledged type checker and interpreter, developed in Java (~15k), and packaged 
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with an extensive CLASS library of code and test suites (~10k), including all the 
examples in this paper. Type checking is decidable in polynomial time, using a 
minimal type annotation, only on cut-bound names and function parameters, 
the multiplicative rules are handled by lazy context splitting (cf. [41]). The 
type checker ensures that corecursive calls are done on a session hereditarily 
descendent from the corecursion parameter, a condition motivated by our SN 
result (Theorem 3). But we also support an unsafe corecursion mode, in which 
this check is turned off, to type programs defined by general corecursion. 

The type checker supports useful type inference and reconstruction abilities. 
The interpreter uses java.util.concurrent.* package [53], using primitives such as 
fine-grained locks and condition variables to emulate the synchronous interac- 
tions of CLASS sessions and a cached thread pool to manage the life cycle of 
short-lived threads. Cell deallocation is implemented by reference counting, in- 
cremented on each share and decremented on each release. Forwarding redirects 
the clients of a shared cell through a chain of forwarding pointers (cf. [9]). 


5 Related Work 


Many resource-aware logics and type systems to tame shared state and interfer- 
ence have been proposed [3,18,57,77,44,17,60,61,24]. These systems adopt some 
form of linearity and/or affinity to resourceful programming [75,30] and to model 
failures /exceptions [28,59,20,36,52]. In CLASS, linearity allows us to control state 
sharing, whereas affinity is useful to ensure memory safety and to represent 
safely finalizable or abortable computations. The hereditary session-discarding 
behaviour of affine sessions, modelled by rule [AVd], is also present in other 
works, e.g. [6,59,20]. 

CLASS builds on top of the PaT correspondence with Linear Logic [22,27,80], 
the logical principles for the state modalities being inspired by DiLL [35]. Recent 
works [43,9,10,7,50,64,67] also address the problem of sharing and nondetermin- 
ism in the setting of session-based PaT. In [67], reference cells may only store 
replicated sessions (of type !A), thus cannot refer to linear entities such as other 
cells or linear sessions, hence cannot represent many realistic programming id- 
ioms that CLASS does (see Section 4). Accommodating linear state in a pure 
PaT approach is thus addressed in this work with a novel, more fundamental 
approach. Furthermore, in [67], recursion is obtained via a system-F style encod- 
ing [79], which cannot model inductive stateful structures with updates in-place 
as we do with CLASS native inductive/coinductive types. 

The take/put operations of CLASS relate with Concurrent Haskell MVars [45] 
and the acquire/release operations of the manifest sharing session-typed lan- 
guage SILLs [9,10]. Sharing in SILLs is based on shift modalities to move from 
shared to linear mode and back, and contraction principles to alias shared ses- 
sions. In CLASS we explore DiLL modalities and cocontraction principles [35] 
to express sharing of linear state and put / take protocols of mutex memory 
cells of invariant type. The work [10] ensures deadlock-freedom by relying on 
programmer provided partial orders on events [55,33,26], whereas in CLASS 
deadlock-freedom follows the same simple and general inductive argument of 
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the corresponding result in e.g. [22], thanks to the logical character of the new 
proof rules (DiLL cocontraction, that enjoys cut-elimination). The work [64] in- 
troduces the language CSLL, by extending linear logic with coexponentials that 
support a notion of shared state, with a quite different approach than ours. CSLL 
does not claim the ability to naturally express shared linked data structures with 
update in-place and fine-grained locking, as CLASS does. Nevertheless, it is nat- 
ural to define in CLASS sessions exporting weakening, sharing and dereliction 
capabilities for linear behaviours, as in our shared buffer example. 

Recently, the work [43] develops Aiock, a substructural-typed A-calculus with 
higher-order locks, which enjoys deadlock-freedom by imposing a set of high-level 
principles that guarantee acyclicity of the lock-sharing topologies, and which fol- 
low in CLASS as a consequence of its logical-motivated type system and DiLL’s 
cocontraction. This work also extends Ajocks with partial orders in which a re- 
source can shared by more than two concurrent threads. None of the models 
in [43,9,10,64] addresses livelock absence or memory safety, as CLASS does. 

As far as we are aware, CLASS is a first proposal integrating shared state 
and recursion in a language based on PaT and Linear Logic, while guaranteeing 
strong normalisation. Least /greatest fixed points in Linear Logic were studied 
in [8], which inspired the development of recursion in [54,73], our treatment 
of recursion draws inspiration on [73]. Several works exploit the technique of 
logical relations to establish strong normalisation for concurrent process cal- 
culi [1,83,69,16,62]. The work [16] proves strong normalisation for a language 
with higher-order store with a type and effect system that stratifies memory 
into regions so as to preclude circularities. Interestingly, in CLASS such stratifi- 
cation is implicitly guaranteed by the acyclicity inherent to Linear Logic. Linear 
logical relations were studied in [62,21,72,74]. In this work we recast and ex- 
tend the technique to Classical Linear Logic, exploring orthogonality [38,8,1], 
and demonstrate, using a specially devised technique of interference-sensitive 
reducibility, how logical relations scale to accommodate shared state. 


6 Concluding Remarks 


We have introduced CLASS, a session-based language founded on a propositions- 
as-types interpretation of Second-Order Classical Linear Logic, extended with 
recursion, affine types, first-class mutex cells and shared linear state. We believe 
that CLASS is the first proposal of a language of its kind to provide the follow- 
ing three strong properties by static typing: well-typed CLASS programs enjoy 
progress, hence never deadlock, do not leak memory and always terminate. 
CLASS metatheoretical properties are obtained in a compositional and mod- 
ular way, by leveraging the key features of propositions-as-types, from which 
the operational semantics and type system also emerges. In CLASS, types and 
process have a consistent proof-theoretical behaviour: typed program constructs 
correspond exactly to proof rules, with a proper compositional semantics via log- 
ical relations (Section 3). Programs are composed by plugging basic constructs 
with the cut rule, and all interaction principles are captured by principal cut 
reductions that act locally in proofs/type derivations (Def. 4). We also obtain 
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an algebraic system based on proof simplification to reason about program (ob- 
servational) equivalence, due to confluence (cf. [65]). 

Besides the foundational relevance of our work, we also argued how CLASS 
can cleanly express realistic concurrent higher-order programming idioms, with 
many compelling examples. Any type system introduces conservative restric- 
tions on its language, but we believe that CLASS offers an interesting balance 
between the strong properties it ensures by typing and its expressiveness. In 
fact, we find CLASS type system helpful to guide the development of safe con- 
current idioms, with a fairly light type annotation burden. As future work, we 
would like to investigate several possible refinements of the CLASS type disci- 
pline, namely, allowing finer-grained resource-access policies to be expressed, and 
exploring the integration of dependent and refinement types [71,51], enhancing 
the logical expressiveness of the basic type system. 
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Abstract. Program sensitivity measures the distance between the out- 
puts of a program when run on two related inputs. This notion, which 
plays a key role in areas such as data privacy and optimization, has been 
the focus of several program analysis techniques introduced in recent 
years. Among the most successful ones, we can highlight type systems 
inspired by linear logic, as pioneered by Reed and Pierce in the Fuzz 
programming language. In Fuzz, each type is equipped with its own dis- 
tance, and sensitivity analysis boils down to type checking. In particular, 
Fuzz features two product types, corresponding to two different notions 
of distance: the tensor product combines the distances of each component 
by adding them, while the with product takes their maximum. 

In this work, we show that these products can be generalized to arbi- 
trary L? distances, metrics that are often used in privacy and optimiza- 
tion. The original Fuzz products, tensor and with, correspond to the 
special cases L! and L®. To ease the handling of such products, we 
extend the Fuzz type system with bunches—as in the logic of bunched 
implications—where the distances of different groups of variables can be 
combined using different L? distances. We show that our extension can be 
used to reason about quantitative properties of probabilistic programs. 


1 Introduction 


When developing a data-driven application, we often need to analyze its sensi- 
tivity, or robustness, a measure of how its outputs can be affected by varying 
its inputs. For example, to analyze the privacy guarantees of a program, we 
might consider what happens when we include the data of one individual in its 
inputs [11]. When analyzing the stability of a machine-learning algorithm, we 
might consider what happens when we modify one sample in the training set [7]. 
Such applications have spurred the development of several techniques to rea- 
son about program sensitivity [23,9]. One successful approach is based on linear- 
like [14] type systems, as pioneered in Reed and Pierce’s Fuzz language [23]. 
The basic idea behind Fuzz is to use typing judgments to track the sensitivity 
of a program with respect to each variable. Each type comes equipped with a 
notion of distance, and the typing rules explain how to update variable sensi- 
tivities for each operation. Because different distances yield different sensitivity 
analyses, it is often useful to endow a set of values with different distances, which 
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leads to different Fuzz types. For example, like linear logic, Fuzz has two notions 
of products: the tensor product ® and the Cartesian product & (with). The first 
one is equipped with the Lt (or Manhattan) distance, where the distance be- 
tween two pairs is computed by adding the distances between the corresponding 
components. The second one is equipped with the L® (or Chebyshev) distance, 
where the component distances are combined by taking their maximum. 


The reason for focusing on these two product types is that they play a key 
role in differential privacy [11], a rigorous notion of privacy that was the motivat- 
ing application behind the original Fuzz design. However, we could also consider 
equipping pairs with more general L? distances, which interpolate between the 
L! and L® and are extensively used in convex optimization [8], information the- 
ory [10] and statistics [15]. Indeed, other type systems for differential privacy in- 
spired by Fuzz [20] include types for vectors and matrices under the L? distance, 
which are required to use the Gaussian mechanism, one of the popular building 
blocks of differential privacy. Supporting more general LP metrics would allow 
us to capture even more such building blocks [17,1], which would enable further 
exploration of the tradeoffs between differential privacy and accuracy. 


In this paper, we extend these approaches and show that Fuzz can be enriched 
with a family of tensor products &p, for 1 < p < oo. These tensor products are 
equipped with the L? distance, the original Fuzz products ® and & corresponding 
to the special cases ®; and ®@ 9. Moreover, each connective p is equipped with 
a corresponding “linear implication” —,, unlike previous related systems where 
such an implication only exists for p = 1. Following prior work [4,3], we give to 
our extension a semantics in terms of non-expansive functions, except that the 
presence of the implications —, forces us to equip input and output spaces with 
more general distances where the triangle inequality need not hold. 


A novelty of our approach is that, to support the handling of such prod- 
ucts, we generalize Fuzz environments to bunches, where each LP distance comes 
with its own context former. Thus, we call our type system Bunched Fuzz. This 
system, inspired by languages derived from the logic of Bunched Implications 
(BI) [22] (e.g. [21]), highlights differences between the original Fuzz design and 
linear logic—for example, products distribute over sums in Fuzz and BI, but 
not in linear logic. While similar indexed products and function spaces have also 
appeared in the literature, particularly in works on categorical grammars [19], 
here they are employed to reason about vector distances and function sensitivity. 


While designing Bunched Fuzz, one of our goals was to use sensitivity to rea- 
son about randomized algorithms. In the original Fuzz, probability distributions 
are equipped with the maz divergence distance, which can be used to state dif- 
ferential privacy as a sensitivity property [23]. Subsequent work has shown how 
Fuzz can also accommodate other distances over probability distributions [3]. 
However, such additions required variants of graded monads, which express the 
distance between distributions using indices (i.e. grades) on the monadic type of 
distributions over their results, as opposed to sensitivity indices on their inputs, 
as it was done in the original Fuzz. In particular, this makes it more difficult to 
reason about distances separately with respect to each input. Thanks to bunches, 
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however, we can incorporate these composition principles more naturally. For ex- 
ample, Bunched Fuzz can reason about the Hellinger distance on distributions 
without the need for output grading, as was done in prior systems [3]. 

We will also see that, by allowing arbitrary LP norms, we can generalize prior 
case studies that were verified in Fuzz and obtain more general methods for rea- 
soning about differential privacy (Section 5). Consider the L?” mechanism [1,17], 
which adds noise to the result of a query whose sensitivity is measured in the 
LP norm. Since Fuzz does not have the means to analyze such a sensitivity mea- 
sure, it cannot implement the LP mechanism; Bunched Fuzz, however, can ana- 
lyze such a measure, and thus allows for a simple implementation in terms of the 
exponential mechanism. Such a mechanism, in turn, can be used to implement a 
variant of a gradient descent algorithm that works under the L” norm, general- 
izing an earlier version that was biased towards the Lt norm [25]. Summarizing, 
our contributions are: 


— We introduce Bunched Fuzz, an extension of Fuzz with types for general L? 
distances: we add type constructors of the form ®, (for 1 < p < oo) for 
pairs under the L” distance along with constructors of the form —, for their 
corresponding function spaces. To support the handling of such types, we 
generalize Fuzz typing contexts to bunches of variable assignments. 

— We give a denotational semantics for Bunched Fuzz by interpreting programs 
as non-expansive functions over spaces built on L? distances. 

— We show that Bunched Fuzz can support types for probability distributions 
for which the sampling primitive, which enables the composition of proba- 
bilistic programs, is compatible with LP distances. 

— We show a range of examples of programs that can be written in Bunched 
Fuzz. Notably, we show that Bunched Fuzz can support reasoning about the 
Hellinger distance without the need for grading, and we show generalizations 
of several examples from the differential privacy literature. 


Check the full version of this paper for more technical details [26]. 


2 Background 


2.1 Metrics and Sensitivity 


To discuss sensitivity, we first need a notion of distance. We call extended pseu- 
dosemimetric space a pair X = (|X|,dx) consisting of a carrier set |X| and an 
extended pseudosemimetric dx : |X|? > R2°, which is a function satisfying, for 
all x,y € |X|, dx(x,x) = 0 and dx(x,y) = dx(y,2). This relaxes the standard 
notion of metric space in a few respects. First, the distance between two points 
can be infinite, hence the extended. Second, different points can be at distance 
zero, hence the pseudo. Finally, we do not require the triangular inequality: 


dx (a,y) < dx(z,z) + dx(z,y), (1) 
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hence the semi. We focus on extended pseudosemimetrics because they support 
constructions that true metrics do not. In particular, they make it possible to 
scale the distance of a space by co and enable more general function spaces. 
However, to simplify the terminology, we will drop the “extended pseudosemi” 
prefix in the rest of the paper, and speak solely of metric spaces. In some occa- 
sions, we might speak of a proper metric space, by which we mean a space where 
the triangle inequality does hold (but not necessarily the other two requirements 
that are missing compared to the traditional definition of metric space). 

Given a function f : X — Y on metric spaces, we say that it is s-sensitive, for 
s in R2°, if we have, for all z1, £2 € X, dy (f (21), f(£2)) < s-dx(x1, x2), where 
we extend addition and multiplication to RZ? by setting œo © s = 5-00 = oo. 
We also say that f is s-Lipschitz continuous, though the traditional definition of 
Lipschitz continuity assumes s Æ oo. If a function is s-sensitive, then it is also 
s’-sensitive for every s’ > s. Every function of type X — Y is oo-sensitive. If a 
function is 1-sensitive, we also say that f is non-expansive. We use X — Y to 
denote the set of such non-expansive functions. The identity function is always 
non-expansive, and non-expansive functions are closed under composition. Thus, 
metric spaces and non-expansive functions form a category, denoted Met. 


2.2 Distances for Differential Privacy 


Among many applications, sensitivity is a useful notion because it provides a con- 
venient language for analyzing the privacy guarantees of algorithms—specifically, 
in the framework of differential privacy [11]. Differential privacy is a technique 
for protecting the privacy of individuals in a database by blurring the results of 
a query to the database with random noise. The noise is calibrated so that each 
individual has a small influence on the probability of observing each outcome 
(while ideally guaranteeing that the result of the query is still useful). 

Formally, suppose that we have some set of databases db equipped with a met- 
ric. This metric roughly measures how many rows differ between two databases, 
though the exact definition can vary. Let f : db — DX be a randomized database 
query, which maps a database to a discrete probability distribution over the set 
of outcomes X. We say that f is ¢-differentially private if it is an e-sensitive 
function from db to DX, where the set of distributions DX is equipped with 
the following distance, sometimes known as the maz divergence: 


p(z) 
p2(x) 


MDx (u1, 42) = X` n 
zEX 


- (2) 


(Here, we stipulate that In |0/0| = 0 and In|p/0| = 1n |0/p| = œœ for p # 0.) 

To understand this definition, suppose that Dı and Dz are two databases at 
distance 1—for instance, because they differ with respect to the data of a single 
individual. If f is «-differentially private, the above definition implies that f (D1) 
and f(Dz2) are at most € apart. When € is large, the probabilities of each outcome 
in the result distributions can vary widely. This means that, by simply observing 
one output of f, we might be able to guess with good confidence which of the 
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databases Dı or Dz was used to produce that output. Conversely, if € is small, 
it is hard to tell which database was used because the output probabilities will 
be close. For this reason, it is common to view € as a privacy loss—the larger it 
is, the more privacy we are giving up to reveal the output of f. 

Besides providing a strong privacy guarantee, this formulation of closeness 
for distributions provides two important properties. First, we can compose dif- 
ferentially private algorithms without ruining their privacy guarantee. Note that 
DX forms a monad, where the return and bind operations are given as follows: 


=h ea (3) 


0 otherwise 


Pu) =y = DY wz) - f@)(y)- (4) 


xzEX 


Intuitively, the return operation produces a deterministic distribution, whereas 
bind samples an element x from u and computes f(x). When composing differ- 
entially private algorithms, their privacy loss can be soundly added together: 


Theorem 1. Suppose that f : db > DX is eı-differentially private and that g : 
db + X — DY is such that the mapping 6 > g(ô) (x) is e2-differentially private 
for every x. Then the composite h : db — DY defined as h(5) = g(6)'(f(6)) is 
(c1 + €2)-differentially private. 


The other reason why the privacy metric is useful is that it supports many 
building blocks for differential privacy. Of particular interest is the Laplace mech- 
anism, which blurs a numeric result with noise drawn from the two-sided Laplace 
distribution. If x € R, let L(x) be the distribution with density* y œ> $e7!?—¥, 


Theorem 2. The mechanism L is a non-expansive function of type R + DR.° 


Thus, to define an e-differentially private numeric query on a database, it suffices 
to define an e-sensitive, deterministic numeric query, and then blur its result 
with Laplace noise. Differential privacy follows from the composition principles 
for sensitivity. This reasoning is justified by the fact that the Laplace mechanism 
adds noise proportional to the sensitivity of the numeric query in L! distance. 


2.3 Sensitivity as a Resource 


Because differential privacy is a sensitivity property, techniques for analyzing 
the sensitivity of programs can also be used to analyze their privacy guarantees. 
One particularly successful approach in this space is rooted in type systems in- 
spired by linear logic, as pioneered by Reed and Pierce in the Fuzz programming 
language [16,23]. At its core, Fuzz is just a type system for tracking sensitivity. 


4 We use here a Laplace distribution with scale 1. 

5 The definitions do not quite match up our setting, since £ is a continuous, and 
not discrete distribution. The result can be put on firm footing by working with a 
discretized version of the Laplace distribution [12]. 
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Typing judgments are similar to common functional programming languages, 
but variable declarations are of the form £; ir; Ti! £1 tr, Ti,---;%n tr, Tn era. 
The annotations r; € RZ? are sensitivity indices, whose purpose is to track the 
effect that changes to the program input can have on its output: if we have two 
substitutions y and +’ for the variables x;, then the metric preservation property 
of the Fuzz type system guarantees that 


d(e[y/z], e[y'/2]) < er - d(q(2:), Y (wi), (5) 


where the metrics d are computed based on the type of each expression and 
value. This means that we can bound the distance on the results of the two 
runs of e by adding up the distances of the inputs scaled by their corresponding 
sensitivities. When this bound is finite, the definition of the metrics guarantees 
that the two runs have the same termination behavior. When r; = co, the above 
inequality provides no guarantees if the value of x; varies. 

Fuzz includes data types commonly found in functional programming lan- 
guages, such as numbers, products, tagged unions, recursive types and functions. 
The typing rules of the language explain how the sensitivities of each variable 
must be updated to compute each operation. The simplest typing rule says that, 
in order to use a variable, its declared sensitivity must be greater than 1: 


r>1 
Eee A Paper 


As a more interesting example, to construct a pair (e1,e2), the following rule 
says that we need to add the sensitivities of the corresponding contexts: 


Fea: Ia F e: T2 
Di +I F (e1,€2) : 0T 


This behavior is a result of the distance of the tensor type &: the distance 
between two pairs in Tı © T2 is the result of adding the distances between the 
first and second components; therefore, the sensitivity of each variable for the 
entire expression is the sum of the sensitivities for each component. In this sense, 
sensitivities in Fuzz behave like a resource that must be distributed across all 
variable uses in a program. For the sake of analogy, we might compare this 
treatment to how fractional permissions work in separation logic: the predicate 
l ++, x indicates that we own a fraction q € [0,1] of a resource stating that l 
points to x. If q = qı + q2, we can split this predicate as l q, £ * l q T, 
allowing us to distribute this resource between different threads. 

The distance on ® corresponds to the sum in the upper bound in the state- 
ment of metric preservation (Equation (5)). This distance is useful because it is 
the one that yields good composition principles for differential privacy. This can 
be seen in the typing rule for sampling from a probabilistic distribution: 


Pre:Or A, T ir TF e2: Ca 
[+At mlet z =e; in e : Oo 
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Here, Or denotes the type of probability distributions over values of type T. 
This operation samples a value x from the distribution e; and uses this value 
to compute the distribution e2. We can justify the soundness of this rule by 
reducing it to Theorem 1: the addition on contexts corresponds to the fact that 
the privacy loss of a program degrades linearly under composition. 

Besides the tensor product &, Fuzz also features a with product &, where 
the distances between components are combined by taking their maximum. This 
leads to a different typing rule for & pairs, which does not add up the sensitivities: 


Pre: TF €9: Te 
I F (e1,€2): 71 & T2 


If we compare these rules for pairs, we see a clear analogy with linear logic: & 
requires us to combine contexts, whereas & allows us to share them. Fuzz’s elim- 
ination rules for products continue to borrow from linear logic: deconstructing a 
tensor gives both elements but deconstructing a with product returns only one. 


Te: @T A, £ ir TY ir TE eiT Pre: &7 
A+rI Flet (x,y) =e ine’: 7’ Dh me:% 


This partly explains why the connectives’ distances involve addition and max- 
imum. When using a tensor product, both elements can affect how much the 
output can vary, so both elements must be considered. (Note that Fuzz is an 
affine type system: we are free to ignore one of the product’s components, and 
thus we can write projection functions out of a tensor product.) When projecting 
out of a with product, only one of the elements will affect the program’s output, 
so we only need to consider the component that yields the maximum distance. 

Fuzz uses the !, type for managing sensitivities. Intuitively, !,7 behaves like 7, 
but with the distances scaled by s; when s = ov, this means that different points 
are infinitely apart. The introduction rule scales the sensitivities of variables in 
the environment. This can be used in conjunction with the elimination rule to 
propagate the sensitivity out of the type and into the environment. 


Tre:t Tre:!,7 A, Zir TE: T 
sIr FHFle:lsT A+rr Flet lz =e in e’: 7’ 


Finally, the rules for the linear implication — are similar to the ones from 
linear logic, but adjusted to account for sensitivities. 


I ,x3TKe:a Thre:tT-~o Ate’:t 
Tb Az.e:T— Oo T+Atee':oa 


To introduce the linear implication —, the bound variable needs to have sensi- 
tivity 1. When eliminating —, the environments need to be added. In categorical 
language, addition, which is also present in the metric for &, is connected to the 
fact that there is an adjunction between the functors X @ (—) and X — (—). 
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2.4 LP distances 


The L! and L® distances are instances of a more general family of L? distances 
(for p € R2).® Given a sequence of distances 7 = (x1,-..,%n) E€ (R2°)", we 
first define the L? pseudonorm’ as follows: ||2||» = (27, 2?)'/?. This definition 
makes sense whenever the distances x; and p are finite. When p = oo, we define 
the right-hand side as the limit max’_, 7;. When x; = oo for some i, we define 
the right-hand side as oo. We have the following classical properties: 


Proposition 1 (Hölder inequality). For all p,q > 1 such that + ; = 1, 
and for all #, 7 € (R2°)", we have: 5,4: < |lzllpllTlle 
For p = 2, q = 2, this is the Cauchy-Schwarz inequality: X? xiy; < ||Z\l2||¥|lo- 


Proposition 2. For 1<p<q we have, for Z € (R2°)": 


llla < Nl lp (6) 
AR i_1 ss, 

Illl < n?“ |Z Iq (7) 
llžļl2 < [IZ]. < Vn |l2ll2 (8) 


The L? pseudonorms yield distances on tuples. More precisely, suppose that 
(Xi)i<i<n are metric spaces. The following defines a metric on X = X,x---x Xn: 


d,(Z, z’) = ||(dx, (x1, x1) tee ,dx,, (En, a) lle 


Proposition 3. For 1 < p < q we have, for Za! EX, xX+++ xX Xp: 


3 Bunched Fuzz: Programming with L’? Distances 


As we discussed earlier, the L! distance is not the only distance on products 
with useful applications. In the context of differential privacy, for example, the 
L? distance is used to measure the sensitivity of queries when employing the 
Gaussian mechanism, a method for private data release that sanitizes data by 
adding Gaussian noise instead of Laplacian noise.’ 

It is possible to extend a Fuzz-like analysis with L? distances by adding 
primitive types and combinators for vectors. This was done, for instance, in 


6 The L? distances can be defined with p > 0 but for simplicity of our treatment we 
will only consider p > 1. 

T “pseudo-” because it can be infinite. 

8 Technically, the Gaussian mechanism is used to achieve a relaxation of differential 
privacy known as approximate, or (€, 6)-differential privacy. Though this notion can- 
not be analyzed directly by classical verification techniques for differential privacy, 
it can be handled by recent extensions of Fuzz [3,20]. 
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the Duet language [20], which provides the Gaussian mechanism as one of the 
primitives for differential privacy. Such an extension can help verify a wide class 
of algorithms that manipulate vectors in a homogeneous fashion, but it makes 
it awkward to express programs that require finer grained access to vectors. 

To illustrate this point, suppose that we have a non-expansive function f : 
R? — R, where the domain carries the L? metric. Consider the mapping 


g(x,y) = f(22,y) + f(2y, x). 


How would we analyze the sensitivity of g? We cannot translate such a program 
directly into a system like Duet, since it does not allow us to manipulate L? 
vectors at the level of individual components. However, we could rewrite the 
definition of g to use matrix operations, which could be easily incorporated in a 
variant of Duet. Specifically, consider the following definition: 


a(l 


The L? sensitivity of a linear transformation 7 œ> MZ can be easily computed 
if we know the coefficients of the matrix M. Note that 


ae ae ma [|M (£ -= y)ll2 
d(M7, My) = ||Mx— MY|2 = ||M(@— y)ll2 = le =l 


|MZll2\ 74 | 
<(s d 
< (sup izi (Z,9) 


The quantity supz||M2Z]|2/||Z||2, known as the operator norm of M, gives the 
precise sensitivity of the above operation, and can be computed by standard 
algorithms from linear algebra. In the case of g, both matrices have a norm of 2. 
This means that we can analyze the sensitivity of g compositionally, as in Fuzz: 
addition is 1-sensitive in each variable, so we just have to sum the sensitivi- 
ties of 7 in each argument, yielding a combined sensitivity of 4. Unfortunately, 
this method of combining the sensitivities of each argument is too coarse when 
reasoning with L? distances, which leads to an imprecise analysis. To obtain a 
2001 7 
0120] ` 


I|z = yll2 


better bound, we can reason informally as follows. First, take M = 


We can compute the operator norm of M directly: 


22 7-2 2 9272 2 5 2 2 
Mia E a 


= sup 


vy y2 +y? zy yr? +y? 


which implies that M is a v5-sensitive function of type R? > R4 = R? x R?. 
Moreover, thanks to Proposition 3, we can view addition (+) as a \/2-sensitive 
operator of type R? —> R, since 


da(a1 + 22, y1 + y2) < de(xi — y1) + de (x2 — yo) = d1 (Z, 7) < V2de(Z, 9). 
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Aa 
Q 
oO 

med: 


T,0,p2=1|R|!s7|Ort|Out|t—pol|7T@pa|r@o (p ERZ, s E€ RE) 
= «|rER|()|Aze|ee| (e,e) | let (x,y) =e ine 


a 
| 


| inj;e | (case e of z. e | y. e) | !e | let !a =e in e 


| mlet x = e in e | return e | --- 


Fig. 1. Types and terms in Bunched Fuzz 


Thus, by rewriting the definition of g as (+) o (f x f)o M, where f x f: RES 
R? x R? > R x R denotes the application of f in parallel, we can compute the 
sensitivity of g by multiplying the sensitivity of each stage, as v2 x 1 x v5 = 
V10 © 3.16, which is strictly better than the previous bound. 

Naturally, we could further extend Fuzz or Duet with primitives for internal- 
izing this reasoning, but it would be preferable to use the original definition of g 
and automate the low-level reasoning about distances. In this section, we demon- 
strate how this can be done via Bunched Fuzz, a language that refines Fuzz by 
incorporating more general distances in its typing environments. Rather assum- 
ing that input distances are always combined by addition, or the L! distance, 
Bunched Fuzz allows them to be combined with arbitrary L? distances. This 
refinement allows us to analyze different components of a vector as individual 
variables, but also to split the sensitivity of these variables while accounting for 
their corresponding vector distances. In the remaining of this section, we present 
the syntax and type system of Bunched Fuzz, highlighting the main differences 
with respect to the original Fuzz design. Later, in Section 4, we will give a 
semantics to this language in terms of metric spaces, following prior work [3]. 


Types and Terms Figure 1 presents the grammar of types and the main term 
formers of Bunched Fuzz. They are similar to their Fuzz counterparts; in par- 
ticular, there are types for real numbers, products, sums, functions, and a unit 
type. The main novelty is in the product type 7T®pa, which combines the metrics 
of each component using the L? distance (cf. Section 2.4). The types T 8&1 o and 
T æ g subsume the types T & o and 7T & a in the original Fuzz language. Note 
that there is no term constructor or destructor for the Fuzz type &, since it is 
subsumed by &æ. The type T —°, o represents non-expansive functions endowed 
with a metric that is compatible with the L? metric, in that currying works (cf. 
Section 5). We will sometimes write & for ®; and — for —. 

Another novelty with respect to Fuzz is that there are two constructors for 
probability distributions, Op and Oy. The first one carries the original Fuzz 
privacy metric, while the second one carries the Hellinger distance. As we will see 
shortly, the composition principle for the Hellinger distance uses a contraction 
operator for the L? distance, which was not available in the original Fuzz design. 
Both distribution types feature term constructors mlet and return for sampling 
from a distribution and for injecting values into distributions. To simplify the 
notation, we do not use separate versions of these term formers for each type. 
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Bunches Before describing its type system, we need to talk about how typing 
environments are handled in Bunched Fuzz. In the spirit of bunched logics, 
environments are bunches defined with the following grammar: 


PAz=-|[e:t]s|L pA 


The empty environment is denoted as -. The form [x : T], states that the variable 
x has type T and sensitivity s. The form I” ,, A denotes the concatenation of T 
and A, which is only defined when the two bind disjoint sets of variables. As 
we will see in Section 4, bunches will be interpreted as metric spaces, and the p 
index denote which LP metric we will use to combine the metrics of I’ and A. 

The type system features several operations and relations on bunches, which 
are summarized in Figure 2. We write I' «~ I” to indicate that we can obtain I’ 
by rearranging commas up to associativity and commutativity, and by treating 
the empty environment as an identity element; Figure 2 has a precise definition. 
Observe that associativity only holds for equal values of p. This operation will 
be used to state a permutation rule for the type system of Bunched Fuzz. 

Like in Fuzz, environments have a scaling operation sI’ which scales all sen- 
sitivities in the bunch by s. For example, 


8([@ : Trp [Y : alra) = (l : T]s-ri p [y : O]s-r2). 


The exact definition of scaling in such graded languages is subtle, since minor 
variations can quickly lead to unsoundness. The definition we are using (o0 -0 = 
0- co = oo), which goes back to prior work [3], is sound, but imprecise, since 
it leads to too many variables being marked as oo-sensitive. It would also be 
possible to have a more precise variant that uses a non-commutative definition of 
multiplication on distances [4], but we keep the current formulation for simplicity. 
(For a more thorough discussion on these choices and their tradeoffs, see the 
“Zero and Infinity” example in Appendix B of the full version [26] of this paper.) 

In the original Fuzz type system, rules with several premises usually have 
their environments combined by adding sensitivities pointwise, which corre- 
sponds to a use of the L! metric. In Bunched Fuzz, we have instead a family of 
contraction operations Contr(p, T, A) for combining environments, one for each 
LP metric. Contraction only makes sense if I’ and A differ only in sensitivities 
and variable names, but have the same structure otherwise. We write this rela- 
tion as l ~ A. When contracting two leaves, sensitivities are combined using 
the LP norm, while keeping variable names from the left bunch. 

Unlike Fuzz, where contraction is implicit in rules with multiple premises, 
Bunched Fuzz has a separate, explicit contraction typing rule. The rule will be 
stated using the vars function, which lists all variables in a bunch. 


Type System Our type system is similar to the one of Fuzz, but adapted to use 
bunched environments. The typing rules are displayed on Figure 3. For example, 
in the QI rule, notice that the p on the tensor type is carried over to the bunch in 
the resulting environment. Similarly, in the —cI rule, the value of p that annotates 
the bunch in the premise is carried over to the —°, in the conclusion. 
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vars(-) = || x. 
vars(|x :T]s) = [z] lc: T]s © [yie] if r= 
vars((I1,p T2)) = vars(I1) + vars(I2) TDi pI % iq A2 ifp=qAl eA; 
I e~ A FFSA 
I ~ -pA if IT e~ A 
I ~ Ap: if le~ A ss 
I ,p T2 ~~ A1,p Ao if T; «~ A; 8 [T]r = [T]s-r 
Dy yp T2 ~ A2,p At if T; «~ A; s(I pA) = sI „s^ 
Typ (IT2,p T3) ~ (Arp Az),p Ag if T; ~~ AG 
IDo ~ I if ly ~~ I> 
1 if p = co 
le {alta otherwise 
Contr(p,:,-) =- 


Contr(p, [x : T]s, [y : Tr) = [£ : T] gysrFrr 
Contr (p, (ia I), (Aig A2)) = c(p, q)(Contr(p, Ih, A1),q Contr (p, In, A2)). 


Fig. 2. Bunch Operations 


Like in Fuzz, the !E rule propagates the scaling factor, but using the bunch 
structure. Rather than adding the two environments, we splice one into the 
other: the notation [’(A) denotes a compound bunch where we plug in the 
bunch A into another bunch I(x) that has a single, distinguished hole x. As 
we mentioned earlier, Bunched Fuzz has an explicit typing rule for contraction, 
whereas contraction in Fuzz is implicit in rules with multiple premises. Note 
also that we have unrestricted weakening. Finally, we have the rules for typing 
the return and bind primitives of the probabilistic types Op and Op. Those 
for Op are adapted from Fuzz, by using contraction instead of adding up the 
environments. The ones for Oy are similar, but use L? contraction instead, since 
that is the metric that enables composition for the Hellinger distance. 

Let us now explain in which sense ®. corresponds to the & connective of 
Fuzz. We will need the following lemma: 


Lemma 1 (Renaming). Assume that there is a type derivation of F e: 7 
and that I ~ I". Then there exists a derivation of I’ H e[vars(I")/vars(L)] : 7. 


Now, the & connective in Fuzz supports two operations, projections and pairing. 
The connective ®. of Bunched Fuzz also supports these operations, but as 
derived forms. First, projections can be encoded by defining 7;(e) for i = 1,2 
as let (41,22) = e in x;. Second, for pairing assume we have two derivations 
of + e; : c; for i = 1,2, and let I” be an environment obtained from I" by 
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s>1 
——_———. AXIOM RI —— II 
lc: t]s air -Er:R “FE Q:1 
Pp (air 7 lac ese i IEF f:T—po Ate:T 
T+ \u.e:T —p o Iyy At fe:o 
Pre: ATO AFei:T Opo Teele ly tele) e:e SE 
I p AF (e1, €2) : T 8p 0 I'(sA) + let (x,y) = e1 in e2: p 
Pre:t Tre:o 
eee Dil ae Bol 
IFinjje:7T@o0 IFinjoe:7T@o 
Thre:tT®o A(|z : T]s)F e2:p A([y:a]s) F e3 : p aa 
A(sI) F case e1 of x. e2 | y. e3 : p i 
The: II Tee. :!,T Alle : Tlr) ea: 
sTEle:!.7 A(sl) F let !2 =e) in eg: o i 
Ir(A pA) Fe:r ARA ë I(-)Fe:r W 
ONTR ————— WEAK 
I'(Contr(p, A, A’))  e[vars(A’)/vars(A)] : T T(A)Fe:r 
Lor eek Dew T 
7 EXCH 
Mre:t 
THA 
Te : A ¿Tla Hea: Tre: 
e1 : OPT ple: 7] ae ee BIND-P oe RETURN-P 
Contr(1, I, A) F mlet x = e; in ez : Opo ool return e: Opt 
THA 
Tres: A Tla Hea: Tre: 
e1 : OHT sp [2 : T]s F e2 : Ono BInD-H cu RETURN-H 
Contr(2, I, A) F mlet z = e1 in e2 : Ogo ool return e: Out 


Fig. 3. Bunched Fuzz typing rules 
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renaming all variables to fresh ones. Then we have [= I” and thus 


Tr ez: 09 re" 
Pre:a I" + eg[vars(I")/vars(L)] : o2 
I joo I’ F (e1, eg[vars(I’)/vars(L)]) : 01 8% G2 
Contr(oo, T, I’) F (€1, €2) : 01 D% 02 


LEMMA 1 


CONTR 


Note that we have defined ¥/c® +y% = max(z,y) by taking the limit of 
¢/xP + yP when p goes to infinity, and thus we have Contr(oo, I, T”) = I’. There- 
fore the pairing rule of & is derivable for ə. 


4 Semantics 


Having defined the syntax of Bunched Fuzz and its type system, we are ready 
to present its semantics. We opt for a denotational formulation, where types T 
and bunches I’ are interpreted as metric spaces [r] and [I], and a derivation 
m of I F e: 7 is interpreted as a non-expansive function [r] : [I] —> [7]. For 
space reasons, we do not provide an operational semantics for the language, but 
we foresee no major difficulties in doing so, since the term language is mostly 
inherited from Fuzz, which does have a denotational semantics proved sound 
with respect to an operational semantics [4]. 


Types Each type 7 is interpreted as a metric space [7] in a compositional fashion, 
by mapping each type constructor to the corresponding operation on metric 
spaces defined in Figure 4. We now explain these definitions. 

The operations of the first four lines of Figure 4 come from prior work on 
Fuzz [4,3]. The definition of ®, uses as carrier set the cartesian product, just as 
® in previous works, but endows it with the L? distance, defined in Section 2.4. 
In the particular case of p = 1, ®; is the same as &. 

As for —p, we want to define it in such a way that currying and uncurrying 
work with respect to ®p, which will allow us to justify the introduction and 
elimination forms for that connective. For that we first choose as carrier set the 
set A —o B of non-expansive functions from A to B. This set carries the metric 


da,B(f,9) 


= inf{r € RẸ | Yz, y € A,da(f(x), 9(y)) < Vr? + da(x,y)?} a 
This metric is dictated by the type of the application operator in the LP norm: 
(A —p B)®,A — B. Intuitively, if f and g are at distance r, and we want appli- 
cation to be non-expansive, we need to satisfy dp (f(x), gly)) < &/r? + da(a, y)? 
for every x,y € A. The above definition says that we pick the distance to be the 
smallest possible r that makes this work. Note that this choice is forced upon us: 
in category-theoretic jargon, the operations of currying and uncurrying, which 
are intimately tied to the application operator, correspond to an adjunction be- 
tween two functors, which implies that any other metric space that yields a 
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similar adjunction with respect to ®, must be isomorphic to —op. In particular, 
this implies that its metric will be the same as the one of —p. 

For OpA and O#A the carrier set is the set DA of discrete distributions 
over A. As to the metric on the carrier set, the interpretation of Qp uses the 
max divergence, used in the definition of differential privacy (see Sect. 2.2). The 
interpretation of Oy uses instead the Hellinger distance (see e.g. [3]): 


HDa(u,v) Ê EIv (x)|? (12) 


or 
Space X| |X| dx (x,y) 
1 {x} 0 
R R |e — yl 
s- da(x,y) if s #oo 
lA |A| coifs=o,rf7yEA 


Oifs=ao,r=yEeA 
da(x,y) ifa,yEA 


A@®B ||A|4+|B| dp(z,y) if x,y € B 
else oo 
A8 B |A| x IBI 3/0 O + deae), m0) 
A —p B| A— B cf. Equation (11) 
OPA DA MD4(z, y); cf. Equation (2) 
OuA DA HDa(za, y); cf. Equation (12) 


Fig. 4. Operations on metric spaces for interpreting types 


Bunches The interpretation of bunches is similar to that of types. Variables 
correspond to scaled metric spaces, whereas ,, corresponds to @p: 


[J=1 [ix : ro] = sr] [r Pe) = [r] @ l. 


One complication compared to prior designs is the use of an explicit exchange 
rule, which is required to handle the richer structure of contexts. Semantically, 
each use of exchange induces an isomorphism of metric spaces: 


Theorem 3. Each derivation of T «~ A corresponds to an isomorphism of 
metric spaces [I] = [A]. 


Before stating the interpretation of typing derivations, we give an overview 
of important properties of the above constructions that will help us prove the 
soundness of the interpretation. 
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Scaling Much like in prior work [4,3], we can check the following equations: 


Proposition 4. 
laila A = lA I (A9 B) =!,A8!,B l (A 8p B) =! A8p !sB. 


Moreover, an s-sensitive function from A to B is the same thing as a non- 
expansive function of type !,A — B. 


Proposition 5. For every bunch I, we have [sr] = ! [1]. 


Tensors The properties on L? distances allow us to relate product types with 
different values of p. 


Proposition 6. /Subtyping of tensors] 


1. Let A, B be two metric spaces and p,q € RZ! with p < q. Then the identity 
map on pairs belongs to the two following spaces: 


A p B — A 8&4 B lo1/p-1/a(A @q B) — A 8p B. 
2. In particular, when p = 1 and q = 2, the identity map belongs to: 


Proof. For (1), the fact that the identity belongs to the first space follows from 
the fact that d(x,y) < dp(x,y), by Proposition 3 (Equation (9)). The second 
claim is derived from Proposition 3 (Equation (9)) in the case n = 2. 


Remark 1. Proposition 6 allows us to relate different spaces of functions with 
multiple arguments. For example, 


(A82 B =C) C (A81 B —=C) (A@® B — C) C (La(A 82 B) — 0). 


Bunched Fuzz does not currently exploit these inclusions in any significant way, 
but we could envision extending the system with a notion of subtyping to further 
simplify the use of multiple product metrics in a single program. 


We also have the following result, which is instrumental to prove the sound- 
ness of the contraction rule. 


Proposition 7. Let X,Y,Z,W be metric spaces, and p,q € RŽ} with p 4 ow. 
The canonical isomorphism of sets (X xY) x (Zx W) = (X x Z) x (Y xW), 
which swaps the second and third components, is a non-expansive function of 
type \o(p,q)((X Bq Y) @p (Z @q W)) > (X 8p Z) 84 (Y @p W), where c(p, q) is 
defined as in Figure 2. 
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Proof. First, suppose that p < q. Then we can write the isomorphism as a 
composite of the following non-expansive functions: 


lepa) ((X 84 Y) @p (Z 8a W) 


> lepa ((X @q Y) 84 (Z 8 W)) Proposition 6 
= !e(p,q) ((X 84 Z) 84 (Y 8 W)) assoc., comm. of &q 
= lepa) (X 8a Z) @q lep, (Y @q W) Proposition 4 
= (X 8p Z) 84 (Y 8p W) Proposition 6. 


Otherwise, p > q, and we reason as follows. 


lepa (X 8q Y) 8p (Z 8q wW) 


> lepa ((X 8p Y) 84 (Z 8p W)) Proposition 6 
= lelp,a) ((X 8p Z) 8p (Y 8p W)) assoc., comm. of 8p 
= (X Qp Z) 84 (Y 8p W) Proposition 6. 


One can then prove the following property: 


Proposition 8. Suppose that we have two bunches T ~ A. The carrier sets of 
[I] and [A] are the same. Moreover, for any p, the diagonal function 6(x) = 
(x,x) is a non-expansive function of type [Contr(p, T, A)] > IF] 8p [A]. 


Function Types The metric on —p can be justified by the following result: 


Proposition 9. For every metric space X and every p € R2!, there is an ad- 
junction of type (—)®@pX 1 X —p (—) in Met given by currying and uncurrying. 
(Both constructions on metric spaces are extended to endofunctors on Met in the 
obvious way.) 


Because right adjoints are unique up to isomorphism, this definition is a direct 
generalization of the metric on functions used in Fuzz [23,4,3], which corresponds 
to —71. 


Theorem 4. Suppose that A and B are proper metric spaces, and let f,g: A > 
B be non-expansive. Then dao, B(f,g) = sup, de(f(x), g(x)). 


We conclude with another subtyping result involving function spaces. 


Theorem 5. For all non-expansive functions f,g E€ A —> B and p > 1, we 
have dao, B(f,g) < da+,B(f,g). In particular, the identity function is a non- 
expansive function of type (A —p B) + (A —: B). 


Probability Distributions Prior work [3] proves that the return and bind opera- 
tions on probability distributions can be seen as non-expansive functions: 


Ni!~oA>OpA 
(-)*(-) : (100.4 01 OPB) @1 OPA > OPB. 


These properties ensure the soundness of the typing rules for Op in Fuzz, and 
also in Bunched Fuzz. For Qy, we can use the following composition principle. 
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Theorem 6. The following types are sound for the monadic operations on dis- 
tributions, seen as non-expansive operations, for any p > 1: 


n: lo A> OHA 
(—)"(-) : (CoA —p Ou B) 82 Oy A > OnB. 


Derivations Finally, a derivation tree builds a function from the context’s space 
to the subject’s space. In the following definition, we use the metavariables y 
and 6 to denote variable assignments—that is, mappings from the variables of 
environments I’ and A to elements of the corresponding metric spaces. We use 
(6) to represent an assignment in [T (A)] that is decomposed into two assign- 
ments y(x) and 6 corresponding to the (x) and A portions. Finally, we use the 
A-calculus notation f x to denote a function f being applied to the value zx. 


Definition 1. Given a derivation n proving T F e: 7, its interpretation [r] € 
IT] —> [7] is given by structural induction on m as follows: 


Axiom] £ Ax. z RIJ £ A).reR 

—o I r] £ ày. Az. [r] (y, £) [— E m me] = Ay, 8). Ira] y ([m] ô) 

17] = 0). 0 BI m T2] = Alq, 4). ([71] 7), (irl 8) 
QE m m2] = (5). [r2] (L715) 

Oi a] = dy. ing, [a] 7 SE m mə] = 6(y). [fr], [73] (i) 9) 

I r] £ [7] IE mı m2] £ à (y). [r2] 6([m] 7) 

Contr n] = A¥(65). Ir] y(6, 6) [Weak r] = Ayx(6). [7] ¥( 0 ) 

Exch n] £ dy [r]b-4(7') [Bind-P m ma] © AY. (rlr (Il) 

Return-P 1] = Ay. nlir] y) 


where in [Exch m], the map op:/p is the isomorphism defined by Theorem 3. 
and for the two last cases see definitions in equations (3) and (4) (Bind-H and 
Return-H are defined in the same way). 


Theorem 7 (Soundness). Given a derivation n proving l'H e:r, then [r] 
is a non-expansive function from the space |T] to the space [rT]. 


5 Examples 
We now look at examples of programs that illustrate the use of LP metrics. 


Currying and Uncurrying Let us illustrate the use of higher-order functions with 
combinators for currying and uncurrying. 


curry : ((T p o) —p P) — (T —p € —p p) 
curry f xy = f(x,y) 
UNCUTTY : (T —p € —p P) — ((T p a) —p Pp). 


uncurry f z = let (x,y)=z in fay 
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Note that the indices on ® and — need to be the same. The reason can be traced 
back to the — E rule (cf. Figure 3), which uses the ,, connective to eliminate 
—o, (cf. the currying and uncurrying derivation in the appendix of the full paper 
for a detailed derivation). If the indices do not agree, currying is not possible; in 
other words, we cannot in general soundly curry a function of type T @p a —q P 
to obtain something of type T —ep o€ —q p. However, if q < p, note that it would 
be possible to soundly view T®,o as a subtype of T®pa, thanks to Proposition 6. 
In this case, we could then convert from T &p o —ųq p to T Bq € —q p (note the 
variance), and then curry to obtain a function of type T —og € —, P. 


Precise sensitivity for functions with multiple arguments Another useful feature 
of Bunched Fuzz is that its contraction rule allows us to split sensitivities more 
accurately than if we used the contraction rule that is derivable in the original 
Fuzz. Concretely, suppose that we have a program Ap.let (x,y) = p in f(x,y)+ 
g(x,y), where f and g have types f : ('2R) @2 R — R and g : R @g (!2R) — R, 
and where we have elided the wrapping and unwrapping of ! types, for simplicity. 

Let us sketch how this program is typed in Bunched Fuzz. Addition belongs to 
R&ı R — R, so by Proposition 6 it can also be given the type ! a(R Q2 R) — R. 
Thus, we can build the following derivation for the body of the program: 


TF f(a1,y1) + g(x2, y2) : R 


CONTR SS 
[x : R] y »2 ly : R] y F f(x,y) + g(x,y): R 


where I = ([z1 : Ra ygaly : Ri yg).2 ([z2 : R] 22 [y2 : R]ayz), and where 
we used contraction twice to merge the zs and ys. Note that ||(2V2, V2)l|l2 = 
V8 +2 = v10, which is why the final sensitivities have this form. By contrast, 
consider how we might attempt to type this program directly in the original 
Fuzz. Let us assume that we are working in an extension of Fuzz with types for 
expressing the domains of f and g, similarly to the L? vector types of Duet [20]. 
Moreover, let us assume that we have coercion functions that allow us to cast 
from (!2R) 82 (!2R) to (!g2R) @2 R and R82 (!2R). If we have a pair p :!o((!2R) 82 
(!gR)), we can split its sensitivity to call f and g and then combine their results 
with addition. However, this type is equivalent to !4(R @2 R), which means that 
the program was given a worse sensitivity (since 10 < 4). Of course, it would 
also have been possible to extend Fuzz with a series of primitives that implement 
precisely the management of sensitivities performed by bunches. However, here 
this low-level reasoning is handled directly by the type system. 


Programming with matrices The Duet language [20] provides several matrix 
types with the L1, L?, or L metrics, along with primitive functions for manip- 
ulating them. In Bunched Fuzz, these types can be defined directly as follows: 
M,[m,n] = 87 83 R. Following Duet, we use the L? distance to combine the 
rows and the L? distance to combine the columns. One advantage of having 
types for matrices defined in terms of more basic constructs is that we can pro- 
gram functions for manipulating them directly, without resorting to separate 
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primitives. For example, we can define the following terms in the language: 


addrow : M,[1,n] ®1 M,[m,n] — M,[m + 1,n] 
addcolumn : My [1,m] @, Mı[m, n] — Mı [m,n + 1] 
addition : My[m,n] 81 Mı[m, n] — My[m, n]. 


The first program, addrow, appends a vector, represented as a 1 x n matrix, to 
the first row of a m x n matrix. The second program, addcolumn, is similar, 
but appends the vector as a column rather than a row. Because of that, it is 
restricted to L! matrices. Finally, the last program, addition, adds the elements 
of two matrices pointwise. 


Vector addition over sets Let us now show an example of a Fuzz term for which 
using LP metrics allows to obtain a finer sensitivity analysis. We consider sets 
of vectors in R? and the function vectorSum which, given such a set, returns 
the vectorial sum of its elements. In Fuzz, this function can be defined via a 
summation primitive sum : !oo(!ooT — R) — setr — R, which adds up the 
results of applying a function to each element of a set [23]. The definition is: 


vectorSum : la set(@¢R) —1 QIR 


vectorSum s = (sum T1 S,..., SUM Tq S). 


Here, m; : QIR — R denotes the i-th projection, which can be defined by 
destructing a product. Set types in Fuzz are equipped with the Hamming metric 
[23], where the distance between two sets is the number of elements by which 
they differ. Note that, to ensure that sum has bounded sensitivity, we need to 
clip the results of its function argument to the interval [—1,1]. Fuzz infers a 
sensitivity of d for this function because its argument is used with sensitivity 
1 in each component of the tuple. In Bunched Fuzz, we can define the same 
function as above, but we also have the option of using a different L? distance 
to define vectorSum, which leads to the type !q1/p set(@@R) —o DIR, with a 
sensitivity of d!/?. For the sake of readability, we’ll show how this term is typed 
in the case d = 2. By typing each term (sum 7; zi) and applying (QI) we get: 


[z1 : set(R 8p R)]1 „p [22 : set(R 8p R)]ı F (sum Tı 21, sum T2 22) : R 8p R. 


By applying contraction we get: |z : set(R p R)Jai/» F (sum T1 z, sum T2 z) : 
R 8p R. The claimed type is finally obtained by (!£) and (— T). 


Computing distances Suppose that the type X denotes a proper metric space 
(that is, where the triangle inequality holds). Then we can incorporate its dis- 
tance function in Bunched Fuzz with the type X @, X — R. Indeed, let z, x’, 
y and y’ be arbitrary elements of X. Then 


dx (a, y) = dx(x',y’) < dx (x, x") Tr dx(x', y’) + dx(y',y) = dx(x',y') 
A dx(x,x') T dx(y, y’) = dı((x, y), (x',y')). 
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By symmetry, we also know that dx(x’, y’)—dx(a,y) < d\((z, y), (2’, y’)). Com- 
bined, these two facts show 


dr(dx(x,y),dx(z',y’)) = |dx (x,y) = dx(2',y’)| < di((z,y), (aa), 


which proves that dx is indeed a non-expansive function. 


Calibrating noise to LP distance Hardt and Talwar [17] have proposed a gener- 
alization of the Laplace mechanism, called the K-norm mechanism, to create a 
differentially private variant of a database query f : db > Rĉ. The difference is 
that the amount of noise added is calibrated to the sensitivity of f measured with 
the K norm, as opposed to the L! distance used in the original Laplace mecha- 
nism. When K corresponds to the LP norm, we will call this the L?-mechanism, 
following Awan and Slavkovich [1]. 


Definition 2. Given f : db > R? with LP sensitivity s and e > 0, the LP- 
mechanism is a mechanism that, given a database D € db, returns a probability 
distribution over y € R? with density given by: 


exp(=ilf)=ulle 


f exp =le \ dy 


This mechanism returns with high probability (which depends on € and on the 
sensitivity s) a vector y € R? which is close to f(D) in L? distance. Such a 
mechanism can be easily integrated in Bunched Fuzz through a primitive: 


LpMech : !4o(!sdB —> QR) — !|<dB —- Op(Q4R) 


(Strictly speaking, we would need some discretized version of the above distribu- 
tion to incorporate the mechanism in Bunched Fuzz, but we’ll ignore this issue 
in what follows.) The fact that LpMech satisfies e-differential privacy follows from 
the fact that this mechanism is an instance of the exponential mechanism [18], a 
basic building block of differential privacy. It is based on a scoring function as- 
signing a score to every pair consisting of a database and a potential output, and 
it attempts to return an output with approximately maximal score, given the 
input database. As shown by Gaboardi et al. [13], the exponential mechanism 
can be added as a primitive to Fuzz with type: 


expmech : !,, set(O) — lal! O —0!,dB — R) —!.dB — ©pO, 
P 


where © is the type of outputs. The function LpMech is an instance of the 
exponential mechanism where O is @#R and the score is Ay\D.|| f(D) — yllp- 

To define the LP mechanism with this recipe, we need to reason about the 
sensitivity of this scoring function. In Fuzz, this would not be possible, since the 
language does not support reasoning about the sensitivity of f measured in the 
L’ distance. In Bunched Fuzz, however, this can be done easily. Below, we will 
see an example (Gradient descent) of how the LP mechanism can lead to a finer 
privacy guarantee. 
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Gradient descent Let us now give an example where we use the LP mechanism. 
An example of differentially private gradient descent example with linear model 
in Fuzz was given in [25] (see Sect. 4.1, 4.2 and Fig. 6 p. 16, Fig. 8 p.19). This 
algorithm proceeds by iteration. Actually it was given for an extended language 
called Adaptative Fuzz, but the code already gives an algorithm in (plain) Fuzz. 
We refer the reader to this reference for the description of all functions, and here 
we will only describe how one can adapt the algorithm to Bunched Fuzz. 

Given a set of n records x; € R4, each with a label y; € R, the goal is to find 
a parameter vector 0 € R? that minimizes the difference between the labels and 
their estimates, where the estimate of a label y; is the inner product (2;, 0}. That 
is, the goal is to minimize the loss function L(0, (x, y)) = 4 - 2%, (xi, 0) — yi)?. 
The algorithm starts with an initial parameter vector (0,...,0) and it iteratively 
produces successive 0 vectors until a termination condition is reached. 

The Fuzz program uses the data-type bag 7 representing bags or multisets 
over T. A bagmap primitive is given for it. The type J is the unit interval (0, 1]. 
The main function is called updateParameter and updates one component of 
the model 6; it is computed in the following way: 


— with the function calcGrad : db + R, compute a component (VL(6, (x, y))); 
of the R? vector VL(6, (x, y)) °. 

— then Laplacian noise is postcomposed with calcGrad in the update Parameter 
function. This uses a privacy budget of 2e. It has to be done for each one of 
the d components of VL(6, (x, y)), thus on the whole, for one step, a privacy 
budget of 2de. 

— The iterative procedure of gradient descent is given by the function gradient 
in Fig. 8 p. 19 of [25]. We forget here about the adaptative aspect and just 
consider iteration with a given number n of steps. In this case by applying 
n times updateParameter one gets a privacy budget of 2dne. 


We modify the program as follows to check it in Bunched Fuzz and use the 
L?-mechanism. Instead of computing over R we want to compute over SIR for 
a given p > 1, so R? equipped with L? distance. The records x; are in @4I and 
the labels y; in J. The database type is dB = bag (I @p (@81)). The distance 
between two bags in dB is the number of elements by which they differ. 

We assume a primitive bagVectorSum with type !q1/pbag (@#1) — @4R (it 
could be defined as the vectorSum defined above for sets, using a sum primitive 
for bags). Given a bag m, (bagVectorSum m) returns the vectorial sum of all 
elements of m. We can check that the sensitivity of bagVectorSum is indeed 
d'/P because given two bags m and m’ that are at distance 1, if we denote by u 
the vector by which they differ, we have: 


digar) (bagV ectorSum(m), bagVectorSum(m')) = ||ul|p < (531)? = qi/P 


By adapting the calcGrad Fuzz term of [25] using bagVectorSum we obtain 
a term VectcalcGrad with the Bunched Fuzz type !oo g3 R —0! /pdb — SJR. 


° Actually calcGrad computes (VL(0, (x, y))); up to a multiplicative constant, 2/n, 
which is mutliplied afterwards in the updateParameter function. 
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Given a vector 6 and a database (y, x), VectcalcGrad computes the updated vec- 
tor 0’. Finally we define the term updateV ector by adding noise to VectcalcGrad 
using the the L?-mechanism. Recall the type of LpMech: !..(!,db — SR) —o 
!edb —> Op(@%R). We define updateV ector and obtain its type as follows: 


updateV ector = 0.(LpMech (VectcalcGrad @)) : Soc og R —!edb —o OpP(82R) 


By iterating updateVector n times one obtains a privacy budget of ne. 


6 Implementation 


To experiment with the Bunched Fuzz design, we implemented a prototype for 
a fragment of the system based on DFuzz [13,2].!? The type-checker generates 
a set of numeric constraints that serve as verification conditions to guarantee a 
valid typing. The implementation required adapting some of the current rules 
to an algorithmic formulation (found in the full version). In addition to the 
modifications introduced in the DFuzz type checker compared to its original 
version [13,2], we also made the following changes and simplifications: 


— We did not include explicit contraction and weakening rules. Instead, the 
rules are combined with those for checking other syntactic constructs. To 
do away with an explicit contraction rule, in rules that have multiple an- 
tecedents, such as the QI rule, we used the Contr operator to combine the 
antecedents’ environments, rather than using the p-concatenation operator 
for bunches. 

— We did not include the rules for checking probabilistic programs with the 
Hellinger distance. 

— Bound variables are always added at the top of the current environment, 
as in the —I rule of the original rules; it is not possible to introduce new 
variables arbitrarily deep in the environment. 


While, strictly speaking, the resulting system is incomplete with respect to the 
rules presented here, it is powerful enough to check an implementation of K- 
means that generalizes a previous version implemented for Fuzz [23]. On the 
other hand, because our implementation is based on the one of DFuzz, which 
features dependent types, we allow functions that are polymorphic on types, sizes 
and p parameters, which allows us to infer sensitivity information that depends 
on run-time sizes. 


7 Related Work 


Bunched Fuzz is inspired by BI, the logic of bunched implications [22], which 
has two connectives for combining contexts. Categorically, one of these connec- 
tives corresponds to a Cartesian product, whereas the other corresponds to a 


10 https: //github.com/junewunder/bunched-fuzz 
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monoidal, or tensor product. While related to linear logic, the presence of the 
two context connectives allows BI to derive some properties that are not valid 
in linear logic. For example, the cartesian product does not distribute over sums 
in linear logic but it does distribute over sums in BI. 


We have shown how the rules for such type systems are reminiscent of the 
ones used in type systems for the calcuclus of bunched implications [21], and 
for reasoning about categorical grammars [19]. Specifically, O’Hearn introduces 
a type system with two products and two arrows [21]. Typing environments are 
bunches of variable assignments with two constructors, corresponding to the two 
products. Our work can be seen as a generalization of O’Hearn’s work to handle 
multiple products and to reason about program sensitivity. 


Moot and Retoré [19, Chapter 5] introduce the multimodal Lambek calculus, 
which extends the non-associative Lambek calculus, a classical tool for describing 
categorical grammars. This generalization uses an indexed family of connectives 
and trees to represent environments. The main differences with our work are: 
our indexed products are associative and commutative, while theirs are not; 
our type system is affine; our type system includes a monad for probabilities 
which does not have a correspondent construction in their logic; our type system 
also possesses the graded comonad !, corresponding to the ! modality of linear 
logic, the interaction between this comonad and the bunches is non-trivial and 
it requires us to explicitly define a notion of contraction. Besides the fact that 
the main properties we study, metric interpretation and program sensitivity, are 
very different from the ones studied by the above authors, there are some striking 
similarities between the two systems. 


A recent work by Bao et al. [5] introduced a novel bunched logic with indexed 
products and magic wands with a preorder between the indices. This logic is used 
as the assertion logic of a separation logic introduced to reason about negative 
dependence between random variables. The connectives studied in this work 
share some similarities with the ones we study here and it would be interesting to 
investigate further the similarities, especially from a model-theoretic perspective. 


Because contexts in the original Fuzz type system are biased towards the L! 
distance, it is not obvious how Fuzz could express the composition principles of 
the Hellinger distance. Recent work showed how this could be amended via a path 
construction that recasts relational program properties as sensitivity proper- 
ties [3]. Roughly speaking, instead of working directly with the Hellinger distance 
dy, the authors consider a family of relations Ra = { (u1, u2) | dg (u1, u2) < a}. 
Such a relation induces another metric on distributions, da, g, where the distance 
between two distributions is the length of the shortest path connecting them in 
the graph corresponding to Ra. This allows them to express the composition 
principles of the Hellinger distance directly in the Fuzz type system, albeit at a 
cost: the type constructor for probability distributions is graded by the distance 
bound a. Thus, the sensitivity information of a randomized algorithm with re- 
spect to the Hellinger distance must also be encoded in the codomain of the 
function, as opposed to using just its domain, as done for the original privacy 
metric of Fuzz. By contrast, Bunched Fuzz does not require the grading a be- 
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cause it can express the composition principle of the Hellinger distance directly, 
thanks to the use of the L? distance on bunches. 

Duet [20] can be seen as an extension of Fuzz to deal with more general 
privacy distances. It consists of a two-layer language: a sensitivity language and 
a privacy language. The sensitivity language is very similar to Fuzz. However, it 
also contains some basic primitives to manage vectors and matrices. As in Fuzz, 
the vector types come with multiple distances but differently from Fuzz, Duet 
also uses the L? distance. The main reason for this is that Duet also supports 
the Gaussian mechanism which calibrates the noise to the L? sensitivity of the 
function. Our work is inspired by this aspect of Duet, but it goes beyond it by 
giving a logical foundation to L” vector distances. Another language inspired by 
Fuzz is the recently proposed Jazz [24]. Like Duet, this language has two prod- 
ucts and primitives tailored to the L? sensitivity of functions for the Gaussian 
mechanism. Interestingly, this language uses contextual information to achieve 
more precise bounds on the sensitivities. The semantics of Jazz is different from 
the metric semantics we study here; however, it would be interesting to explore 
whether a similar contextual approach could be also used in a metric setting. 


8 Conclusion and Future work 


In this work we have introduced Bunched Fuzz, a type system for reasoning 
about program sensitivity in the style of Fuzz [23]. Bunched Fuzz extends the 
type theory of Fuzz by considering new type constructors for L? distances and 
bunches to manage different products in typing environments. We have shown 
how this type system supports reasoning about both deterministic and proba- 
bilistic programs. 

There are at least two directions that we would like to explore in future works. 
On the one hand, we would like to understand if the typing rules we introduced 
here could be of more general use in the setting of probabilistic programs. We 
have already discussed the usefulness for other directions in the deterministic 
case [19]. One way to approach this problem could be by looking at the family 
of products recently identified in [5]. These products give a model for a logic to 
reason about negative dependence between probabilistic variables. It would be 
interesting to see if the properties of these products match the one we have here. 

On the other hand, we would like to understand if Bunched Fuzz can be used 
to reason about more general examples in differential privacy. One way to ap- 
proach this problem could be to consider examples based on the use of Hellinger 
distance that have been studied in the literature on probabilistic inference [6]. 
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Abstract. We study the foundations of variational inference, which 
frames posterior inference as an optimisation problem, for probabilis- 
tic programming. The dominant approach for optimisation in practice is 
stochastic gradient descent. In particular, a variant using the so-called 
reparameterisation gradient estimator exhibits fast convergence in a tra- 
ditional statistics setting. Unfortunately, discontinuities, which are read- 
ily expressible in programming languages, can compromise the correct- 
ness of this approach. We consider a simple (higher-order, probabilistic) 
programming language with conditionals, and we endow our language 
with both a measurable and a smoothed (approximate) value semantics. 
We present type systems which establish technical pre-conditions. Thus 
we can prove stochastic gradient descent with the reparameterisation 
gradient estimator to be correct when applied to the smoothed problem. 
Besides, we can solve the original problem up to any error tolerance by 
choosing an accuracy coefficient suitably. Empirically we demonstrate 
that our approach has a similar convergence as a key competitor, but 
is simpler, faster, and attains orders of magnitude reduction in work- 
normalised variance. 


Keywords: probabilistic programming - variational inference - reparam- 
eterisation gradient - value semantics - type systems. 


1 Introduction 


Probabilistic programming is a programming paradigm which has the vision 
to make statistical methods, in particular Bayesian inference, accessible to a 
wide audience. This is achieved by a separation of concerns: the domain experts 
wishing to gain statistical insights focus on modelling, whilst the inference is per- 
formed automatically. (In some recent systems [4,9] users can improve efficiency 
by writing their own inference code.) 

In essence, probabilistic programming languages extend more traditional pro- 
gramming languages with constructs such as score or observe (as well as 
sample ) to define the prior p(z) and likelihood p(x | z). The task of infer- 
ence is to derive the posterior p(z | x), which is in principle governed by Bayes’ 
law yet usually intractable. 

Whilst the paradigm was originally conceived in the context of statistics 
and Bayesian machine learning, probabilistic programming has in recent years 
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proven to be a very fruitful subject for the programming language community. 
Researchers have made significant theoretical contributions such as underpinning 
languages with rigorous (categorical) semantics [35,34,15,37,12,10] and investi- 
gating the correctness of inference algorithms [16,7,22]. The latter were mostly 
designed in the context of “traditional” statistics and features such as condition- 
als, which are ubiquitous in programming, pose a major challenge for correctness. 

Inference algorithms broadly fall into two categories: Markov chain Monte 
Carlo (MCMC), which yields a sequence of samples asymptotically approaching 
the true posterior, and variational inference. 


Variational Inference. In the variational inference approach to Bayesian statis- 
tics [40,30,5,6], the problem of approximating difficult-to-compute posterior prob- 
ability distributions is transformed to an optimisation problem. The idea is to 
approximate the posterior probability p(z | x) using a family of “simpler” den- 
sities gg(z) over the latent variables z, parameterised by 8. The optimisation 
problem is then to find the parameter 0* such that qg«(z) is “closest” to the 
true posterior p(z | x). Since the variational family may not contain the true 
posterior, gg* is an approximation in general. In practice, variational inference 
has proven to yield good approximations much faster than MCMC. 

Formally, the idea is captured by minimising the KL-divergence [30,5] be- 
tween the variational approximation and the true posterior. This is equivalent 
to maximising the ELBO function, which only depends on the joint distribution 
p(x,z) and not the posterior, which we seek to infer after all: 


ELBOQOg@ := la~qo(z) [log p(x, Z) _ log qe(z)] (1) 


Gradient Based Optimisation. In practice, variants of Stochastic Gradi- 
ent Descent (SGD) are frequently employed to solve optimisation problems of 
the following form: argming Es~q(s)[f(0,s)]. In its simplest version, SGD follows 
Monte Carlo estimates of the gradient in each step: 


N 
1 (i) 
Ox41 = Ok — Wk: N 2 Vof (1.8, ) 
gradient estimator 


where s ~q (s0) and yz is the step size. 
For the correctness of SGD it is crucial that the estimation of the gradient 


is unbiased, i.e. correct in expectation: 


N 
1 . 
Eg), s nq È S Vof (6,5) 
g= 


This property, which is about commuting differentiation and integration, can be 
established by the dominated convergence theorem [21, Theorem 6.28]. 


= VoEs~q(s) [f(9, s)] 
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Note that we cannot directly estimate the gradient of the ELBO in Eq. (1) 
with Monte Carlo because the distribution w.r.t. which the expectation is taken 
also depends on the parameters. However, the so-called log-derivative trick can 
be used to derive an unbiased estimate, which is known as the Score or REIN- 
FORCE estimator [31,38,27,28]. 


Reparameterisation Gradient. Whilst the score estimator has the virtue of 
being very widely applicable, it unfortunately suffers from high variance, which 
can cause SGD to yield very poor results®. 

The reparameterisation gradient estimator—the dominant approach in varia- 
tional inference—reparameterises the latent variable z in terms of a base random 
variable s (viewed as the entropy source) via a diffeomorphic transformation dg, 
such as a location-scale transformation or cumulative distribution function. For 
example, if the distribution of the latent variable z is a Gaussian N (z | u, o°) 
with parameters 6 = {,o} then the location-scale transformation using the 
standard normal as the base distribution gives rise to the reparameterisation 


z~ N(z| uo?) — z= uols), s~ N (0,1). (2) 


where u, (8) = s -o + u. The key advantage of this setup (often called “repa- 
rameterisation trick” [20,36,32]) is that we have removed the dependency on 0 
from the distribution w.r.t. which the expectation is taken. Therefore, we can 
now differentiate (by backpropagation) with respect to the parameters 0 of the 
variational distributions using Monte Carlo simulation with draws from the base 
distribution s. Thus, succinctly, we have 


Vo WA [f(0, z)] = Vo Lewg(s) [f(, go(s))] = Uswq(s) [Vo fO, pe(s))] 


The main benefit of the reparameterisation gradient estimator is that it has 
a significantly lower variance than the score estimator, resulting in faster con- 
vergence. 


Bias of the Reparameterisation Gradient. Unfortunately, the reparame- 
terisation gradient estimator is biased for non-differentiable models [23], which 
are readily expressible in programming languages with conditionals: 


Example 1. The counterexample in [23, Proposition 2], where the objective func- 
tion is the ELBO for a non-differentiable model, can be simplified to 


0 ifs+0<0 
1 otherwise 


f(0,8) = —0.5- 0? + 


Observe that (see Fig. La): 


Vo Eswwo,1) [f(9, 8)] = —0 +N (—8 | 0,1) 4 -0 = Es~w (0,1) [Vof(9, 5)] 


3 see e.g. Fig. 5a or [28] 
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(a) Dashed red: biased ; estima- (b) ELBO trajectories (higher means bet- 
tor Es.w(0,1) [Vof(@, s)], solid green: ter) obtained with our implementation 
true gradient Vo Es~w(0,1) [f (0, 8)]- (cf. Section 7) 


Fig. 1: Bias of the reparameterisation gradient estimator for Example 1. 


Crucially this may compromise convergence to critical points or maximisers: 
even if we can find a point where the gradient estimator vanishes, it may not 
be a critical point (let alone optimum) of the original optimisation problem 
(cf. Fig. 1b) 


Informal Approach 


As our starting point we take a variant of the simply typed lambda calculus 
with reals, conditionals and a sampling construct. We abstract the optimisation 
of the ELBO to the following generic optimisation problem 


argming Es~p[[M](0,s)] (3) 
where [M] is the value function [7,26] of a program M and D is independent 


of the parameters @ and it is determined by the distributions from which M 
samples. Owing to the presence of conditionals, the function [M] may not be 


continuous, let alone differentiable. R 
Example 1 can be expressed as 1 Ko 
1 as 
(Az. —0.5 - 6? + (if z < 0 then 0 else 1)) (sample w + 0) oe: 
Our approach is based on a denotational se- pl 
mantics [(—)],, (for accuracy coefficient 7 > 0) of wa 2 
programs in the (new) cartesian closed category pe ae =. gg 
VectFr, which generalises smooth manifolds and —1 —0.5 0.5 1 
extends Frélicher spaces (see e.g. [13,33]) with a 
vector space structure. Fig. 2: (Logistic) sigmoid 
Intuitively, we replace the Heaviside step- function o, (dotted: 7 = 
function usually arising in the interpretation of z dashed: ņn = is) and 


conditionals by smooth approximations. In partic- the Heaviside step function 
ular, we interpret the conditional of Example 1 as (red, solid). 


lif s+ 0 < Othen Oelse 1],,(0, s) := an(s + 8) 
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where gẹ is a smooth function. For instance we can choose 0,(x) = o(;) where 


a(x“) = Tepes) is the (logistic) sigmoid function (cf. Fig. 2). Thus, the pro- 
gram M is interpreted by a smooth function [M],,, for which the reparameter- 
isation gradient may be estimated unbiasedly. Therefore, we apply stochastic 


gradient descent on the smoothed program. 


Contributions 


The high-level contribution of this paper is laying a theoretical foundation for 
correct yet efficient (variational) inference for probabilistic programming. We 
employ a smoothed interpretation of programs to obtain unbiased (reparame- 
terisation) gradient estimators and establish technical pre-conditions by type 
systems. In more detail: 


1. We present a simple (higher-order) programming language with conditionals. 
We employ trace types to capture precisely the samples drawn in a fully eager 
call-by-value evaluation strategy. 

2. We endow our language with both a (measurable) denotational value seman- 
tics and a smoothed (hence approximate) value semantics. For the latter we 
furnish a categorical model based on Frélicher spaces. 

3. We develop type systems enforcing vital technical pre-conditions: unbiased- 
ness of the reparameterisation gradient estimator and the correctness of 
stochastic gradient descent, as well as the uniform convergence of the smooth- 
ing to the original problem. Thus, our smoothing approach in principle yields 
correct solutions up to arbitrary error tolerances. 

4. We conduct an empirical evaluation demonstrating that our approach ex- 
hibits a similar convergence to an unbiased correction of the reparameterised 
gradient estimator by [23] — our main baseline. However our estimator is sim- 
pler and more efficient: it is faster and attains orders of magnitude reduction 
in work-normalised variance. 


Outline. In the next section we introduce a simple higher-order probabilistic pro- 
gramming language, its denotational value semantics and operational semantics; 
Optimisation Problem 1 is then stated. Section 3 is devoted to a smoothed deno- 
tational value semantics, and we state the Smooth Optimisation Problem 2. In 
Sections 4 and 5 we develop annotation based type systems enforcing the correct- 
ness of SGD and the convergence of the smoothing, respectively. Related work 
is briefly discussed in Section 6 before we present the results of our empirical 
evaluation in Section 7. We conclude in Section 8 and discuss future directions. 


Notation. We use the following conventions: bold font for vectors and lists, + 
for concatenation of lists, Vg for gradients (w.r.t. 8),[¢] for the Iverson bracket of 
a predicate ¢ and calligraphic font for distributions, in particular M for normal 
distributions. Besides, we highlight noteworthy items using red. 
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2 A Simple Programming Language 


In this section, we introduce our programming language, which is the simply- 
typed lambda calculus with reals, augmented with conditionals and sampling 
from continuous distributions. 


2.1 Syntax 


The raw terms of the programming language are defined by the grammar: 


M:=2/|6;|r|+|-|=| =| exp | log 
| if M < 0then M else M | sample p | Ax. M | M M 


where x and 6; respectively range over (denumerable collections of) variables and 
parameters, r € R, and D is a probability distribution over R (potentially with a 
support which is a strict subset of R). As is customary we use infix, postfix and 
prefix notation: M + N (addition), M : N (multiplication), M—! (inverse), and 
—M (numeric negation). We frequently omit the underline to reduce clutter. 


Example 2 (Encoding the ELBO for Variational Inference). We consider the 
example used by [23] in their Prop. 2 to prove the biasedness of the reparam- 
eterisation gradient. (In Example 1 we discussed a simplified version thereof.) 
The density is 


p(z) = N(z|0,1)- a eS = 
(0|5,1) otherwise 
and they use a variational family with density qg(z) := N(z | 6,1), which is 
reparameterised using a standard normal noise distribution and transformation 
sme s+. 
First, we define an auxiliary term for the pdf of normals with mean m and 
standard derivation s: 


N =)a,m,s. (V27 - s) ` exp (=0.5 -((a + (—m))- s1)’) 
Then, we can define 


M = (dz. log (N 20.1) + (if z < 0 then log (N 0 (—2) 1) else log (N 05 1)) — 
= es VS 
log p 
log (N z0 1) ) (sample v + 8) 
-i 
og qo 


2.2 A Basic Trace-Based Type System 


Types are generated from base types (R and Ryo, the reals and positive reals) 
and trace types (typically X, which is a finite list of probability distributions) 
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as well as by a trace-based function space constructor of the form T è X > 7’. 
Formally types are defined by the following grammar: 


trace types X ::= [Dj,...,Dn] n>0 
base types l := R | R>o 

safe types o ::=1 |oo || >o 

types T=L| TeX >T 


where D; are probability distributions. Intuitively a trace type is a description 
of the space of execution traces of a probabilistic program. Using trace types, a 
distinctive feature of our type system is that a program’s type precisely charac- 
terises the space of its possible execution traces [24]. We use list concatenation 
notation + for trace types, and the shorthand 7, —> Tə for function types of the 
form 7ı è || — 79. Intuitively, a term has type 7 e X > 7’ if, when given a value 
of type 7, it reduces to a value of type 7’ using all the samples in X. 

Dual context typing judgements of the form, I | X + M : 7, are defined 
in Fig. 3b, where I = x1 : %,°++ En | Tn, 91: Ti, t , Om : Th is a finite map 
describing a set of variable-type and parameter-type bindings; and the trace type 
X precisely captures the distributions from which samples are drawn in a (fully 
eager) call-by-value evaluation of the term M. 

The subtyping of types, as defined in Fig. 3a, is essentially standard; for 
contexts, we define I' E I” if for every x: 7 in I’ there exists x: r’ in I” such 
that TET. 

Trace types are unique [18]: 


Lemma 1. [ff | 2+ M:7 and |X'FM:7r then X= X. 


A term has safe type o if it does not contain sample p or ø is a base type. 
Thus, perhaps slightly confusingly, we have | [D] + samplep : R, and R 
is considered a safe type. Note that we use the metavariable ø to denote safe 


types. 


Conditionals. The branches of conditionals must have a safe type. Otherwise it 
would not be clear how to type terms such as 


M = if x < Othen (àx. sample y) else (Av. sample ¢ + sample £) 
N = (Af. f (f sample w)) M 


because the branches draw a different number of samples from different distribu- 
tions, and have types Re [|N] + R and Re |E, E] > R, respectively. However, for 
M' = if x < 0 then sample y else sample ¢ + sample ¢ we can (safely) type 


xz: R|[W,E,E] F- M': R 


|] F Aw. M: Re[|N,E, E] > R 
| [N, NM, E, E, N, E, E] F (Af. f (f sample y)) Az. M’): R 
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NLE NLET 
Et RsoLR (neX>mn)E(Ti eX —> rT) 
(a) Subtyping 
r|XFM:T ri i 
"| S-M:r 7 c=? zir| |] Fær 
R reEeER 
IFR" Er: Ro >0 
Fo: R> R>R\ Ld [IF o: Rso > Rso > Rso t} 
I]F=:R>R |] ot: R> > R>o 


| [| F exp: R > Rso | JF log: R>o > R 
P|ZbKL:R T|S'FM:o P| S"EN:o 
T| SH SL’ + 2" + if L < Othen M else N : o | [D] } samplep: R 
D y:n | LEM: 72 P\ SFM: ne Xs >n | Se Nin 
r|] F ày.M: me E> r P| SHS. +H V3 MN: 72 


(b) Typing judgments 


Fig. 3: A Basic Trace-based Type System 


Example 3. Consider the following terms: 


L = Az. sample y + sample w 
M = if x < 0 then (åy. y + y) sample y else (sample y + sample w) 


We can derive the following typing judgements: 


| [F L: Rs0o0 |N, N] > R 


x: Rso | IWW, N,N] 
gi 

| IW, N, N,N] 

| VN] 


FM:R 
b Ax. M : Roo N,N, N] > R 
- (Av. M) sample w : R 


- (Af. f (F 0)) (Av. sample w) : R 


Note that if x < 0 then (Az. sample y) else (Ax. x) is not typable. 


2.3 Denotational Value Semantics 


Next, we endow our language with a (measurable) value semantics. It is well- 
known that the category of measurable spaces and measurable functions is not 
cartesian-closed |1], which means that there is no interpretation of the lambda 
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calculus as measurable functions. These difficulties led [14] to develop the cat- 
egory QBS of quasi-Borel spaces. Notably, morphisms can be combined piece- 
wisely, which we need for conditionals. 

We interpret our programming language in the category QBS of quasi-Borel 
spaces. Types are interpreted as follows: 


[R] = (R, Mr)  [R>0] = (R>o, Mayo) [Das-- Dall = (R, Mr)” 
[71 e X> te] = in] x [2] > [>] 

where Mg is the set of measurable functions R — R; similarly for Mg,,. (As for 

trace types, we use list notation (and list concatenation) for traces.) 


We first define a handy helper function for interpreting application. For f : 
[I] x R™ => [ne X3 > tr] and g: [I] x R”? => [rı] define 


f Qg: [D] x Rett"! > [ry] 
(7,81 + s2 + s3) + f(7,81)(9(7,82),83) $1 € R™,s2 E€ R”,s3 € RI” 


We interpret terms-in-context, |I | X- M:7] : [F] x [X] > [r], as follows: 


[P| [D]F sample p : RI(y,[s]) = s 
[P| JR Ay. M: 110 X> re] (9, f) = 
(v,s) € [n] x [2] => [ain | XFM: p(y, v), s) 
II | 2H Yo H 23A MN: 7] = 
[CSF M: 71023 > me] @[L | Lo N: n] 
II | 2, H Xo + 03+ if L < Othen M else N : r] (7,81 + s2 + 83)) := 
[P| Xo M:7](,82) if [I | X1 F L: R](7,s1) <0 
II | 23+ N: 7] (9,83) otherwise 


It is not difficult to see that this interpretation of terms-in-context is well- 
defined and total. For the conditional clause, we may assume that the trace type 
and the trace are presented as partitions X1 ++ X2 + X3 and sı + S2 + S3 
respectively. This is justified because it follows from the judgement I | © ++ 
Xə + X3 F if L < Othen M else N : rT that | XFL: RT | X2 FM:0 
and I | X3 F N : o are provable; and we know that each of X1, X2 and X3 is 
unique, thanks to Lemma 1; their respective lengths then determine the partition 
Sı + S2 + s3. Similarly for the application clause, the components X; and Xə 
are determined by Lemma 1, and X3 by the type of M. 


2.4 Relation to Operational Semantics 


We can also endow our language with a big-step CBV sampling-based semantics 
similar to [7,26], as defined in [18, Fig. 6]. We write M 4$, V to mean that 
M reduces to value V, which is a real constant or an abstraction, using the 
execution trace s and accumulating weight w. 
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Based on this, we can define the value- and weight-functions: 


V if M45, V 


undef otherwise 


w if M45, V 


0 otherwise 


valuem (s) = i weight q (s) = i 


Our semantics is a bit non-standard in that for conditionals we evaluate 
both branches eagerly. The technical advantage is that for every (closed) term- 
in-context, | [D1,--- , Dn] F M :ı, M reduces to a (unique) value using exactly 
the traces of the length encoded in the typing, i.e., n. 

So in this sense, the operational semantics is “total”: there is no divergence. 
Notice that there is no partiality caused by partial primitives such as 1/z, thanks 
to the typing. 

Moreover there is a simple connection to our denotational value semantics: 


Proposition 1. Let | [Di,...,Dn]+ M :ı. Then 


1. dom(valuem) = R” 
2. [M] = value 


3. weight y (s) = Mii pdfp, (sz) 


2.5 Problem Statement 
We are finally ready to formally state our optimisation problem: 


Problem 1. Optimisation 


Given: term-in-context, 01 : t1, , Ôm : tm | [D1,..-, Dn] FM:R 


Find: argming Es,~2,....,s,,~>,, [IM](0,s8)] 


3 Smoothed Denotational Value Semantics 


Now we turn to our smoothed denotational value semantics, which we use to 
avoid the bias in the reparameterisation gradient estimator. It is parameterised 
by a family of smooth functions oy : R — [0,1]. Intuitively, we replace the 
Heaviside step-function arising in the interpretation of conditionals by smooth 
approximations (cf. Fig. 2). In particular, conditionals if z < 0 then (else 1 are 
interpreted as z++ 0,(z) rather than [z > 0] (using Iverson brackets). 

Our primary example is o,(x) = o(7), where ø is the (logistic) sigmoid 
a(x) = Treptcay’ see Fig. 2. Whilst at this stage no further properties other 
than smoothness are required, we will later need to restrict o to have good 
properties, in particular to convergence to the Heaviside step function. 

As a categorical model we propose vector Frélicher spaces VectFr, which (to 
our knowledge) is a new construction, affording a simple and direct interpretation 
of the smoothed conditionals. 
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3.1 FrGlicher Spaces 


We recall the definition of Frélicher spaces, which generalise smooth spaces*: A 
Frélicher space is a triple (X,Cx, Fx) where X is a set, Cx C Set(R, X) isa 
set of curves and Fx C Set(X,R) is a set of functionals. satisfying 


1. if c € Cx and f € Fx then f oce C™(R,R) 
2. if c: R > X such that for all f € Fx, f oc € C™(R,R) then c € Cx 
3. if f : X > R such that for all c € Cx, foc € C®(R,R) then f € Fx. 


A morphism between Frölicher spaces (X,Cx,Fx) and (Y,Cy,Fy) is a map 
@:X + Y satisfying f o oce C™(R,R) for all f € Fy and c € Cx. 

Frölicher spaces and their morphisms constitute a category Fr, which is well- 
known to be cartesian closed [13,33]. 


3.2 Vector Frélicher Spaces 


To interpret our programming language smoothly we would like to interpret 
conditionals as o,,-weighted convex combinations of its branches: 


[if L < Othen M else N]n (y, S1 H+ s2 +83) = 
Om (—[L] (7, 81)) IMn; 52) + on Eln (7, 81) Nns) (4) 


By what we have discussed so far, this only makes sense if the branches have 
ground type because Frélicher spaces are not equipped with a vector space 
structure but we take weighted combinations of morphisms. In particular if 
o1,¢2 : X —> Y anda: X —> R are morphisms then agı + ¢2 ought to be 
a morphism too. Therefore, we enrich Frélicher spaces with an additional vector 
space structure: 


Definition 1. An R-vector Frélicher space is a Frélicher space (X,Cx, Fx) 
such that X is an R-vector space and whenever c,c' E€ Cx and a E€ C™(R,R) 
then ac+c € Cy (defined pointwise). 

A morphism between R-vector Frélicher spaces is a morphism between 
Frélicher spaces, i.e. 6 : (X,Cx,Fx) > (Y,Cy,Fy) is a morphism if for all 
cECx and f € Fy, fodoce C™(R,R). 


R-vector Frélicher space and their morphisms constitute a category VectFr. 
There is an evident forgetful functor fully faithfully embedding VectFr in Fr. 
Note that the above restriction is a bit stronger than requiring that Cx is also a 
vector space. (a is not necessarily a constant.) The main benefit is the following, 
which is crucial for the interpretation of conditionals as in Eq. (4): 


Lemma 2. If ¢,,¢2 € VectFr(X,Y) and a € VectFr(X,R) then ad, + do € 
VectFr(X,Y) (defined pointwisely). 


Proof. Suppose c € Cx and f € Fy. Then (a, ¢; + %2) 0c = (aoc): (¢d, 0c) + 
($2 0c) € Cy (defined pointwisely) and the claim follows. 


4 C™(R, R) is the set of smooth functions R > R 


490 B. Khajwal et al. 


Similarly as for Frélicher spaces, if X is an R-vector space then any C C 
Set(X,R) generates a R-vector Frélicher space (X,Cx, Fx), where 
Fx :={f:X >R|YceC. foce C™(R,R)} 
Cx ={c:R> X | Yf € Fx. foc € C™(R,R)} 
Cx := [Daa |n E€ N,Vi < n.a; € C” (R,R), c; € ex 
i=1 


Having modified the notion of Frélicher spaces generated by a set of curves, the 
proof for cartesian closure carries over [18] and we conclude: 


Proposition 2. VectFr is cartesian closed. 


3.3 Smoothed Interpretation 


We have now discussed all ingredients to interpret our language (smoothly) in 
the cartesian closed category VectFr. We call [M] the n-smoothing of [M] (or 
of M, by abuse of language). The interpretation is mostly standard and follows 
Section 2.3, except for the case for conditionals. The latter is given by Eq. (4), 
for which the additional vector space structure is required. 

Finally, we can phrase a smoothed version of our Optimisation Problem 1: 


Problem 2. n-Smoothed Optimisation 


Given: term-in-context, 01 : t1,:+:,9m : tm | [Di,..-,;Dn] F M : R, and 
accuracy coefficient 7 > 0 


Find: argming E,,~p,,...,.s,.D, [[M],(4,s)] 


4 Correctness of SGD for Smoothed Problem and 
Unbiasedness of the Reparameterisation Gradient 


Next, we apply stochastic gradient descent (SGD) with the reparameterisation 
gradient estimator to the smoothed problem (for the batch size N = 1): 


Ox +1 = Ok — Ye > Vol My (Ox, Sx) sk~ D (5) 


where 0 | [s ~ D] F M : R (slightly abusing notation in the trace type). 
A classical choice for the step-size sequence is yk € O(1/k), which satisfies 
the so-called Robbins-Monro criterion: 


J w=% 2 neS (6) 
keEN keEN 


In this section we wish to establish the correctness of the SGD procedure 
applied to the smoothing Eq. (5). 
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4.1 Desiderata 


First, we ought to take a step back and observe that the optimisation problems 
we are trying to solve can be ill-defined due to a failure of integrability: take 
M = (Aa. exp (x: x)) sample w: we have Ez~w [|M] (2)] = œ, independently of 
parameters. Therefore, we aim to guarantee: 


(SGDO) The optimisation problems (both smoothed and unsmoothed) are 
well-defined. 


Since E[M],,(@,s)] (and E[[1/](@,s)]) may not be a convex function in the 
parameters 8, we cannot hope to always find global optima. We seek instead 
stationary points, where the gradient w.r.t. the parameters 0 vanishes. The fol- 
lowing results (whose proof is standard) provide sufficient conditions for the 
convergence of SGD to stationary points (see e.g. [3] or [2, Chapter 2]): 


Proposition 3 (Convergence). Suppose (Yk)ken satisfies the Robbins-Monro 
criterion Eq. (6) and g(@) = E,[f(0@,s)] is well-defined. If © C R™ satisfies 


(SGD1) Unbiasedness: Vog(@) = Es[Vof(@,s)] for all 0 € O 
(SGD2) g is L-Lipschitz smooth on © for some L > 0: 


\|Veg(9) — Vog(@’)|| < L- 9-6’ | for all 0,0'€ O 


(SGD3) Bounded Variance: supgce Es|||Vof(8,s)||7] < œ 


then infien E[||Vg(0;)||7] =0 or 0; g © for some i EN. 


Unbiasedness (SGD1) requires commuting differentiation and integration. 
The validity of this operation can be established by the dominated convergence 
theorem [21, Theorem 6.28], see [18]. To be applicable the partial derivatives of f 
w.r.t. the parameters need to be dominated uniformly by an integrable function. 
Formally: 


Definition 2. Let f: Ox R” —> R and g : R” —> R. We say that g uniformly 
dominates f if for all (0,s) € O x R”, |f(8,s)| < g(s). 


Also note that for Lipschitz smoothness (SGD2) it suffices to uniformly bound 
the second-order partial derivatives. 

In the remainder of this section we present two type systems which restrict 
the language to guarantee properties (SGD0) to (SGD3). 


4.2 Piecewise Polynomials and Distributions with Finite Moments 


As a first illustrative step we consider a type system Fpoly, which restricts terms 
to (piecewise) polynomials, and distributions with finite moments. Recall that a 
distribution D has (all) finite moments if for all p € N, E,.p||s|?] < oo. Distri- 
butions with finite moments include the following commonly used distributions: 
normal, exponential, logistic and gamma distributions. A non-example is the 
Cauchy distribution, which famously does not even have an expectation. 
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Definition 3. For a distribution D with finite moments, f : R” —> R has (all) 
finite moments if for all p € N, Es.p||f(s)|?] < co. 


Functions with finite moments have good closure properties: 
Lemma 3. If f,g: R” —> R have (all) finite moments so do —f, f +9, f-g. 


In particular, if a distribution has finite moments then polynomials do, too. 
Consequently, intuitively, it is sufficient to simply (the details are explicitly 
spelled out in [18]): 


1. require that the distributions D in the sample rule have finite moments: 


TD] ace aen A D has finite moments 


2. remove the rules for —', exp and log from the type system Fpoly- 

Type Soundness I: Well-Definedness. Henceforth, we fix parameters 0 : 
1,---,;9m : Lm. Intuitively, it is pretty obvious that |M] is a piecewise polynomial 
whenever 0 | X Fpoiy M : i. Nonetheless, we prove the property formally to 
illustrate our proof technique, a variant of logical relations, employed throughout 
the rest of the paper. 

We define a slightly stronger logical predicate P on O xR” > [7], which 
allows us to obtain a uniform upper bound: 


1. fE PO) if f is uniformly dominated by a function with finite moments 
2. fe p™ if for all ng € N and g € PET? foge Parr ts) 


T103 —>T2 


where for f : O x R™ > [n e X3 > m2] and g : O x R”1+”2 — [rı] we define 


f Og: O x pRritn+| 2s] >T 
(0,81 + s2 ++ s3) +> f(0,s1)(g(0, s1 + 82), s3) 
Intuitively, g may depend on the samples in sə (in addition to s1) and the function 
application may consume further samples s3 (as determined by the trace type 


X3). By induction on safe types we prove the following result, which is important 
for conditionals: 


Lemma 4. If f € PS”) andg,h € PS” then [f(—) < 0]-g+[f(—) > 0]-h € PP. 
Proof. For base types it follows from Lemma 3. Hence, suppose o has the form 
o1@[] + o2. Let no E€ Nand x € P2t™. By definition, (gOz), (hOz) € pore. 
Let f be the extension (ignoring the additional samples) of f to @xR"t™ > R. 
It is easy to see that also f € P\"*”2) By the inductive hypothesis, 


IFC) < 0]: (g © x) +[F(-) > 0] - (h © x) € PEt) 
Finally, by definition, 


(AFC) < O]-g+[f(-) 2 0]-h)© 


8 
l 
dL 
A 
2 
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Assumption 1 We assume that © C [u] x --- x [im] is compact. 


Lemma 5 (Fundamental). Jf 0,271 :7),...,%¢: Te | X Fpoy M:7,n EN, 
& € Pir), 6 SOE pir) then [M] * (€1,---,€2) € Perri) where 


[M] * (f1,...,€2): O x RHI > fr] 
(0,s +s’) ++ [M]((0, €:(0,s),..-,€2(0,8)),8’) 


It is worth noting that, in contrast to more standard fundamental lemmas, here 
we need to capture the dependency of the free variables on some number n of 
further samples. E.g. in the context of (Ax. x) sample w the subterm z depends 
on a sample although this is not apparent if we consider x in isolation. 

Lemma 5 is proven by structural induction [18]. The most interesting cases in- 
clude: parameters, primitive operations and conditionals. In the case for param- 
eters we exploit the compactness of © (Assumption 1). For primitive operations 
we note that as a consequence of Lemma 3 each p™ is closed under negation®, 
addition and multiplication. Finally, for conditionals we exploit Lemma 3. 


Type Soundness IT: Correctness of SGD. Next, we address the integrability 
for the smoothed problem as well as (SGD1) to (SGD3). We establish that not 
only [M] but also its partial derivatives up to order 2 are uniformly dominated 
by functions with finite moments. For this to possibly hold we require: 


Assumption 2 For every n > 0, 


sup |o (x)| < oo sup |o (x)| < 00 sup |o; (£)| < co 
xER xER xER 
Note that, for example, the logistic sigmoid satisfies Assumption 2. 

We can then prove a fundamental lemma similar to Lemma 5, mutatis mu- 
tandis, using a logical predicate in VectFr. We stipulate f € Qs” if its partial 
derivatives up to order 2 are uniformly dominated by a function with finite mo- 
ments. In addition to Lemma 3 we exploit standard rules for differentiation (such 
as the sum, product and chain rule) as well as Assumption 2. We conclude: 


Proposition 4. If 0| X Fpoy M : R then the partial derivatives up to order 2 
of [M],, are uniformly dominated by a function with all finite moments. 


Consequently, the Smoothed Optimisation Problem 2 is not only well-defined 
but, by the dominated convergence theorem [21, Theorem 6.28], the reparame- 
terisation gradient estimator is unbiased. Furthermore, (SGD1) to (SGD3) are 
satisfied and SGD is correct. 


Discussion. The type system F yoly is simple yet guarantees correctness of SGD. 
However, it is somewhat restrictive; in particular, it does not allow the expression 
of many ELBOs arising in variational inference directly as they often have the 
form of logarithms of exponential terms (cf. Example 2). 


> fortu=R 
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4.3 A Generic Type System with Annotations 


Next, we present a generic type system with annotations. In Section 4.4 we give 
an instantiation to make )o1y more permissible and in Section 5 we turn towards 
a different property: the uniform convergence of the smoothings. 

Typing judgements have the form I | X F? M : 7, where “?” indicates 
the property we aim to establish, and we annotate base types. Thus, types are 
generated from 


trace types X == [s1 ~ Dy,..-,8n ~ Dn] 
base types l := R | R>o 

safe types o:=VP loelloo 

types Tau=w’|reL or 


Annotations are drawn from a set and may possibly restricted for safe types. 
Secondly, the trace types are now annotated with variables, typically X = [sı ~ 
Dj,...,5n ~ Dn] where the variables s; are pairwise distinct. 

For the subtyping relation we can constrain the annotations at the base type 
level [18]; the extension to higher types is accomplished as before. 

The typing rules have the same form but they are extended with the annota- 
tions on base types and side conditions possibly constraining them. For example, 
the rules for addition, exponentiation and sampling are modified as follows: 


d. Add 7 (cond. Exp 
Ta iraa l 


(cond. Sample) 


| [J F? +: 3 6 a au 


| [s; ~ D] F? sample p : R® 


The rules for subtyping, variables, abstractions and applications do not need to 
be changed at all but they use annotated types instead of the types of Section 2.2. 


r| She Mit 


rE? I'r E? 7 


I’| Sb, M:7' x:7|[JFea:r 
I y:n | XFM: r| Soke Mit, 0X3 >n F| be Nin 
D| [F dy. Mite > r D| +H 52+ 53hy MN: T 


The full type system is presented in [18]. 

Fpoly can be considered a special case of Fẹ whereby we use the singleton * 
as annotations, a contradictory side condition (such as false) for the undesired 
primitives 71, exp and log, and use the side condition “D has finite moments” 
for sample as above. 

Table 1 provides an overview of the type systems of this paper and their 
purpose. F and its instantiations refine the basic type system of Section 2.2 in 
the sense that if a term-in-context is provable in the annotated type system, 
then its erasure (i.e. erasure of the annotations of base types and distributions) 
is provable in the basic type system. This is straightforward to check. 
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Table 1: Overview of type systems in this paper. 


property Section judgement annotation 
totality Section 2.2 i. E 
Section 4.2 Fpoly none/* 


correctness SGD 


Section 4.4 sap 0/1 


uniform convergence Section 5.1  Funir (f, A)/(t, A) 


| J -sap exp: R® > RY |I] Fsep log : REY > RO 
[Predera |] Feayese ieee a 
| [ SGD —: RO => R® | [| Fgap Ti : RY) +> RY) 


I| Stsep L:1 P|’ Hsen M:o T|" tsap N:o 
P| VHS’ +H X" sep if L < Othen M else N : o 


D has finite moments 


| [s; ~ D] ksap sample p : RO 


Fig. 4: Excerpt of the typing rules (cf. [18]) for the correctness of SGD. 


4.4 A More Permissible Type System 


In this section we discuss another instantiation, Fsap, of the generic type system 
system to guarantee (SGDO) to (SGD3), which is more permissible than F poly. 
In particular, we would like to support Example 2, which uses logarithms and 
densities involving exponentials. Intuitively, we need to ensure that subterms 
involving exp are “neutralised” by a corresponding log. To achieve this we an- 
notate base types with 0 or 1, ordered discretely. 0 is the only annotation for 
safe base types and can be thought of as “integrable”; 1 denotes “needs to be 
passed through log”. More precisely, we constrain the typing rules such that if 
0 | X Fsap M : l°) then® log® o[M] and the partial derivatives of log® of], 
up to order 2 are uniformly dominated by a function with finite moments. 

We subtype base types as follows: fer) Csap in if 41 E ty (as defined in 
Fig. 3a) and e1 = eg, or 4) = Ryo = ty and e1 < eg. The second disjunct may 
come as a surprise but we ensure that terms of type R® cannot depend on 
samples at all. 

In Fig. 4 we list the most important rules; we relegate the full type system to 
[18]. exp and log increase and decrease the annotation respectively. The rules for 
the primitive operations and conditionals are motivated by the closure properties 


6 using the convention log? is the identity 
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of Lemma 3 and the elementary fact that logo(f - g) = (logof) + (logog) and 
log o(f-+) = —logof for f,g : O x R” > R. 


Example 4. 0: R® IW, N] Fsap log (0-1 - exp (sample w)) + sample w : R© 


Note that the branches of conditionals need to have safe type, which rules out 
branches with type R®. This is because logarithms do not behave nicely when 
composed with addition as used in the smoothed interpretation of conditionals. 

Besides, observe that in the rules for logarithm and inverses e = 0 is allowed, 
which may come as a surprise’. This is e.g. necessary for the typability of the 
variational inference Example 2: 


Example 5 (Typing for Variational Inference). It holds | []}+ N : R© > RO > 
RO) > RY and 6: RO | [sı ~N] E M : RO. 


Type Soundness. To formally establish type soundness, we can use a logical 
predicate, which is very similar to the one in Section 4.2 (N.B. the additional 


Item 2): in particular f € o if 


1. partial derivatives of logfof up to order 2 are uniformly dominated by a 
function with finite moments 


2. if ©) is RO) then f is dominated by a positive constant function 


Using this and a similar logical predicate for [(—)] we can show: 
Proposition 5. If 0; :,...,0m > | X Fsan M :1© then 


1. all distributions in X have finite moments 
2. [M] and for each ņn > 0 the partial derivatives up to order 2 of [M] are 
uniformly dominated by a function with finite moments. 


Consequently, again the Smoothed Optimisation Problem 2 is not only well- 
defined but by the dominated convergence theorem, the reparameterisation gra- 
dient estimator is unbiased. Furthermore, (SGD1) to (SGD3) are satisfied and 
SGD is correct. 


5 Uniform Convergence 


In the preceding section we have shown that SGD with the reparameterisation 
gradient can be employed to correctly (in the sense of Proposition 3) solve the 
Smoothed Optimisation Problem 2 for any fixed accuracy coefficient. However, 
a priori, it is not clear how a solution of the Smoothed Problem 2 can help to 
solve the original Problem 1. 

The following illustrates the potential for significant discrepancies: 


T Recall that terms of type RS} cannot depend on samples. 
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Example 6. Consider M = if 0 < Othen@-0+1else(#—1)- (0—1). Wop that 
the global minimum and the only stationary point of |M li is at 0 = 5 regardless 
of n > 0, where [M],,($) = 3. On the other hand [M](4) = ; and the global 
minimum of |M] is at 0 = 1. 


In this section we investigate under which conditions the smoothed objective 
function converges to the original objective function uniformly in 0 € ©: 


(Unif) Esvp [[M],)(9,8)] 5 Esp [LM] (@,s)] as nN 0 for 0 € © 


We design a type system guaranteeing this. 

The practical significance of uniform convergence is that before running SGD, 
for every error tolerance € > 0 we can find an accuracy coefficient 7 > 0 such 
that the difference between the smoothed and original objective function does 
not exceed €, in particular for 6* delivered by the SGD run for the 7-smoothed 
problem. 


Discussion of Restrictions. To rule out the pathology of Example 6 we require 
that guards are non-0 almost everywhere. 

Furthermore, as a consequence of the uniform limit theorem [29], (Unif) 
can only possibly hold if the expectation Es~p [[.M](@,s)] is continuous (as 
a function of the parameters 0). For a straightforward counterexample take 
M = if0 < OthenOelsel, we have E,[[M](0)] = [0 > 0] which is discontin- 
uous, let alone differentiable, at 0 = 0. Our approach is to require that guards 
do not depend directly on parameters but they may do so, indirectly, via a dif- 
feomorphic® reparameterisation transform; see Example 8. We call such guards 
safe. 

In summary, our aim, intuitively, is to ensure that guards are the composition 
of a diffeomorphic transformation of the random samples (potentially depending 
on parameters) and a function which does not vanish almost everywhere. 


5.1 Type System for Guard Safety 


In order to enforce this requirement and to make the transformation more ex- 
plicit, we introduce syntactic sugar, transform sample p by T, for applications 
of the form T sample p. 


Example 7. As expressed in Eq. (2), we can obtain samples from N (u, 07) via 
transform sample w by (As.s -o + u), which is syntactic sugar for the term 
(às. s -o + p) sample w. 


We propose another instance of the generic type system of Section 4.3, Funif, 
where we annotate base types by a = (g, A), where g € {f,t} denotes whether 


we seek to establish guard safety and A is a finite set of s; capturing possible 


dependencies on samples. We subtype base types as follows: {9 141) C nif 19242) 


8 [18, Example 12] illustrates why it is not sufficient to restrict the reparameterisation 
transform to bijections (rather, we require it to be a diffeomorphism). 
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if 4; E vg (as defined in Fig. 3a), A; C Ag and gı < go, where t < f. This is 
motivated by the intuition that we can always drop® guard safety and add more 
dependencies. 

The rule for conditionals ensures that only safe guards are used. The unary 
operations preserve variable dependencies and guard safety. Parameters and con- 
stants are not guard safe and depend on no samples (see [18] for the full type 
system): 


P| eee bea) D| Fumi Mo T|" Fuit IN Oo 
P| OH SH SD” Fuit if L < Othen M else N : o 


| J tune — : ROA) > R4) 


[4] 


r 
Oi : £0) | [| Funif 6; : iB) | [| Fanit r: ae) 
0 | [| Punit T : R% > R® 


tia T diffeomorphic 
0 | [s; ~ D] Funif transform sample p by T : R\t*) 


A term @ | [| Funi T : R” > R® is diffeomorphic if [T] (0, [) = [Z],(4. []) : 
R > R is a diffeomorphism for each 0 € O, i.e. differentiable and bijective with 
differentiable inverse. 

First, we can express affine transformations, in particular, the location-scale 
transformations as in Example 7: 


Example 8 (Location-Scale Transformation). The term-in-context 
Ge REP, u >REM | J] F As.o- s+ p: RED) > R11) 


is diffeomorphic. (However for o : R‘” it is not because it admits ø = 0.) 
Hence, the reparameterisation transform 


G=o: REO p : R& | [s1 : D]- transform sample p by (As.s-o+p) : REA: 


which has g-flag t, is admissible as a guard term. Notice that G depends on the 
parameters, o and u, indirectly through a diffeomorphism, which is permitted 
by the type system. 


If guard safety is sought to be established for the binary operations, we 


require that operands do not share dependencies on samples: 


| [| tunis o : EA — EA) —, EA) CERE 
{+,}, 4NA =0 


| [unit 2 AD) — tA) y ((tai0aa) © 


This is designed to address: 


° as long as it is not used in guards 


Fast and Correct Optimisation for Probabilistic Programming via Smoothing 499 


Example 9 (Non-Constant Guards). We have | {] (Av. +(—a)): R&A) > 
REfs1}), noting that we must use g = f for the + rule; and because Rte) C nif 
R&-(55}), we have 


[Db (ar + (=2)) : RD) 5 REED, 


Now transform sample p by (Ay.y) has type R“{*1}) with the g-flag necessar- 
ily set to t; and so the term 


M = (Ax.x + (—2)) transform sample p by (Ay.y) 


which denotes 0, has type R&:t#}), but not R&t*1}), It follows that M cannot 
be used in guards (notice the side condition of the rule for conditional), which 
is as desired: recall Example 6. Similarly consider the term 


N = (Aa.(Ay zif y + (—z) < 0 then M; else M2) x £) 
(transform sample p by (Ay.y)) (7) 


When evaluated, the term y + (—z) in the guard has denotation 0. For the same 
reason as above, the term JN is not refinement typable. 


The type system is however incomplete, in the sense that there are terms-in- 
context that satisfy the property (Unif) but which are not typable. 


Example 10 (Incompleteness). The following term-in-context denotes the “iden- 
tity”: 

[+ (Az-(2- 2) + (-2)): ROOD 4 ROD 
but it does not have type Rt) — R¢-{s1}), Then, using the same reasoning 
as Example 9, the term 


G = (Ax.(2 - x) + (—2)) (transform sample p by (Ay.y)) 


has type R&:#}), but not R41), and so if G < Othen Oelse 1 is not typable, 
even though G can safely be used in guards. 


5.2 Type Soundness 


Henceforth, we fix parameters 0; : ee) Prag Gent E0., 

Now, we address how to show property (Unif), i.e. that for 0 | X Funig M : 
UG), the n-smoothed E[[M],,(8,s)] converges uniformly for 0 € © as n N 0. For 
this to hold we clearly need to require that ay has good (uniform) convergence 


properties (as far as the unavoidable discontinuity at 0 allows for): 


unif. 


Assumption 3 For every ô > 0, on —> [(—) > 0] on (—oo, —6) U (6, 00). 


Observe that in general even if M is typable [M/],, does not converge uniformly 
in both @ and s because [M] may still be discontinuous in s: 
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Example 11. For M = if (transform sample y by (As. s+6)) < Othen 0 else 1, 
[M](@, s) = [s +8 > 0], which is discontinuous, and [M]n(0, s) = an(s + 9). 


However, if 0 | X H M : (94) then [M],, does converge to [M] uniformly 
almost uniformly, i.e., uniformly in 0 € © and almost uniformly in s € R”. 
Formally, we define: 


Definition 4. Let f, fn: O x R” —> R, u be a measure on R”. We say that fn 
converges uniformly almost uniformly to f (notation: fn aes f) if there exist 
sequences (Ôk)ken, (€x)ken and (Nk)ken such that limp_-yoo Ôk = 0 = liMk>o Ek; 


and for every k € N and 0 € © there exists U C R” such that 


1. u(U) < p and 
2. for every 0 < N < ngk and s € R” \U, |f,(0,s) — f(@,s)| < ex. 


If f, fn are independent of 0 this notion coincides with standard almost uniform 
convergence. For M from Example 11 [M], === [M] holds although uniform 
convergence fails. 

However, uniform almost uniform convergence entails uniform convergence 


of expectations: 


Lemma 6. Let f, fn : O x R” > R have finite moments. 
If fa = f then Exw[fn(0,8)] > Exw[f(0,8)].- 


As a consequence, it suffices to establish [M], == [M]. We achieve this by 
positing an infinitary logical relation between sequences of morphisms in VectFr 
(corresponding to the smoothings) and morphisms in QBS (corresponding to the 
measurable standard semantics). We then prove a fundamental lemma (details 
are in [18]). Not surprisingly the case for conditionals is most interesting. This 
makes use of Assumption 3 and exploits that guards, for which the typing rules 
assert the guard safety flag to be t, can only be 0 at sets of measure 0. We 
conclude: 


Theorem 1. If 6, : F8) Om! £0) |5 Hanif M : ROA) then [IM], u.a.u. 
[M]. In particular, if [M],, and |M] also have finite moments then 


unif. 


isu pl[M],(0,s)] 2S Es~nI{M](0,s)] as \,0 forde O 


We finally note that Funig can be made more permissible by adding syntactic 
sugar for a-fold (for a € Nyo) addition a: M = M +---+ M and multiplication 
Mt=M..--: M. This admits more terms as guards, but safely [18]. 


6 Related Work 


[23] is both the starting point for our work and the most natural source for 
comparison. They correct the (biased) reparameterisation gradient estimator for 
non-differentiable models by additional non-trivial boundary terms. They present 
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an efficient method for affine guards only. Besides, they are not concerned with 
the convergence of gradient-based optimisation procedures; nor do they discuss 
how assumptions they make may be manifested in a programming language. 

In the context of the reparameterisation gradient, [25] and [17] relax discrete 
random variables in a continuous way, effectively dealing with a specific class of 
discontinuous models. [39] use a similar smoothing for discontinuous optimisation 
but they do not consider a full programming language. 

Motivated by guaranteeing absolute continuity (which is a necessary but not 
sufficient criterion for the correctness of e.g. variational inference), [24] use an 
approach similar to our trace types to track the samples which are drawn. They 
do not support standard conditionals but their “work-around” is also eager in the 
sense of combining the traces of both branches. Besides, they do not support a 
full higher-order language, in which higher-order terms can draw samples. Thus, 
they do not need to consider function types tracking the samples drawn during 
evaluation. 


7 Empirical Evaluation 


We evaluate our smoothed gradient estimator (SMOOTH) against the biased repa- 
rameterisation estimator (REPARAM), the unbiased correction of it (LY Y18) 
due to [23], and the unbiased (SCORE) estimator [31,38,27]. The experimental 
setup is based on that of [23]. The implementation is written in Python, using 
automatic differentiation (provided by the jax library) to implement each of 
the above estimators for an arbitrary probabilistic program. For each estima- 
tor and model, we used the Adam [19] optimiser for 10,000 iterations using a 
learning rate of 0.001, with the exception of xornet for which we used 0.01. 
The initial model parameters ĝo were fixed for each model across all runs. In 
each iteration, we used N = 16 Monte Carlo samples from the gradient esti- 
mator. For the Lyy18 estimator, a single subsample for the boundary term was 
used in each estimate. For our smoothed estimator we use accuracy coefficients 
n € {0.1,0.15,0.2}. Further details are discussed in [18, Appendix E.1]. 


Compilation for First-Order Programs. All our benchmarks are first-order. We 
compile a potentially discontinuous program to a smooth program (parame- 
terised by o,,) using the compatible closure of 


if L < Othen M else N ~~ (Aw.0,(—w):M+o0,(w)-N)L 


Note that the size only increases linearly and that we avoid of an exponential 
blow-up by using abstractions rather than duplicating the guard L. 


Models. We include the models from [23], an example from differential privacy 
[11] and a neural network for which our main competitor, the estimator of [23], 
is not applicable (see [18, Appendix E.2] for more details). 
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Fig. 5: ELBO trajectories for each model. A single colour is used for each esti- 
mator and the accuracy coefficient 7 = 0.1, 0.15, 0.2 for SMOOTH is represented 
by dashed, solid and dotted lines respectively. 


Analysis of Results 


We plot the ELBO trajectories in Fig. 5 and include data on the computational 
cost and work-normalised variance [8] in [18, Table 2]. (Variances can be im- 
proved in a routine fashion by e.g. taking more samples.) 

The ELBO graph for the temperature model in Fig. 5a and the cheating 
model in Fig. 5d shows that the REPARAM estimator is biased, converging to 
suboptimal values when compared to the SMOOTH and Lyy18 estimators. For 
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temperature we can also see from the graph and the data in [18, Table 2a] that 
the SCORE estimator exhibits extremely high variance, and does not converge. 

Finally, the xornet model shows the difficulty of training step-function based 
neural nets. The Lyy18 estimator is not applicable here since there are non-affine 
conditionals. In Fig. 5e, the REPARAM estimator makes no progress while other 
estimators manage to converge to close to 0 ELBO, showing that they learn a 
network that correctly classifies all points. In particular, the SMOOTH estimator 
converges the quickest. 

Summa summarum, the results reveal where the REPARAM estimator is bi- 
ased and that the SMOOTH estimator does not have the same limitation. Where 
the Lyy18 estimator is defined, they converge to roughly the same objective 
value. Our smoothing approach is generalisable to more complex models such as 
neural networks with non-linear boundaries, as well as simpler and cheaper (there 
is no need to compute a correction term). Besides, our estimator has consistently 
significantly lower work-normalised variance, up to 3 orders of magnitude. 


8 Conclusion and Future Directions 


We have discussed a simple probabilistic programming language to formalise 
an optimisation problem arising e.g. in variational inference for probabilistic 
programming. We have endowed our language with a denotational (measurable) 
value semantics and a smoothed approximation of potentially discontinuous pro- 
grams, which is parameterised by an accuracy coefficient. We have proposed 
type systems to guarantee pleasing properties in the context of the optimisation 
problem: For a fixed accuracy coefficient, stochastic gradient descent converges 
to stationary points even with the reparameterisation gradient (which is unbi- 
ased). Besides, the smoothed objective function converges uniformly to the true 
objective as the accuracy is improved. 

Our type systems can be used to independently check these two properties 
to obtain partial theoretical guarantees even if one of the systems suffers from 
incompleteness. We also stress that SGD and the smoothed unbiased gradient 
estimator can even be applied to programs which are not typable. 

Experiments with our prototype implementation confirm the benefits of re- 
duced variance and unbiasedness. Compared to the unbiased correction of the 
reparameterised gradient estimator due to [23], our estimator has a similar con- 
vergence, but is simpler, faster, and attains orders of magnitude (2 to 3,000 x) 
reduction in work-normalised variance. 


Future Directions. A natural avenue for future research is to make the language 
and type systems more complete, i.e. to support more well-behaved programs, 
in particular programs involving recursion. 

Furthermore, the choice of accuracy coefficients leaves room for further in- 
vestigations. We anticipate it could be fruitful not to fix an accuracy coefficient 
upfront but to gradually enhance it during the optimisation either via a pre- 
determined schedule (dependent on structural properties of the program), or 
adaptively. 
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Abstract. Variational Quantum Algorithms are hybrid classical-quantum 
algorithms where classical and quantum computation work in tandem to 
solve computational problems. These algorithms create interesting chal- 
lenges for the design of suitable programming languages. In this paper 
we introduce Qimaera, which is a set of libraries for the Idris 2 pro- 
gramming language that enable the programmer to implement hybrid 
classical-quantum algorithms where the full power of the elegant Idris 
language works in synchrony with quantum programming primitives. The 
two key ingredients of Idris that make this possible are (1) dependent 
types which allow us to implement unitary quantum operations; and (2) 
linearity which allows us to enforce fine-grained control over the exe- 
cution of quantum operations so that we may detect and reject many 
physically inadmissible programs. We also show that Qimaera is suitable 
for variational quantum programming by providing implementations of 
two prominent variational quantum algorithms — QAOA and VQE. 


1 Introduction 


Variational Quantum Algorithms [30,25,13] present a computational paradigm 
where hybrid classical-quantum algorithms work in tandem to solve computa- 
tional problems. The classical part of the algorithm is performed by a classical 
processor and the quantum part of the algorithm is executed on a quantum 
device. During the computation process, intermediary results produced by the 
quantum device are passed onto the classical device which performs further com- 
putation on them that is used to tune the parameters of the quantum part of the 
algorithm, which therefore has an effect on the quantum dynamics. The hybrid 
classical-quantum back and forth process repeats until a desired termination 
condition is satisfied. 

This hybrid classical-quantum computational paradigm opens up interesting 
and important challenges for the design of suitable programming languages. It 
is clear that if we wish to program within such computational scenarios, we 


Source code for Qimaera [1] and a full version of the paper [12] are available. 


© The Author(s) 2023 
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need to develop a language that correctly models the manipulation of quan- 
tum resources. In particular, quantum measurements give rise to probabilistic 
computational effects that are inherited by the classical side of the language. 
Another issue is that quantum information behaves very differently compared 
to classical information. As an example, quantum information cannot be copied 
in a uniform way [36], unlike classical information, which may be freely copied 
without restriction. Therefore, if we wish to avoid runtime errors, the quantum 
fragment of the language needs to be equipped with features for fine-grained con- 
trol, such as for example, having a substructural typing discipline [16,8,7,24,6] 
where contraction (i.e., copying) is restricted. On the other hand, when doing 
classical computation, such restrictions are unnecessary and often inconvenient. 
One solution to this problem is to design a language with a classical (non-linear) 
fragment together with a quantum (linear) one, both of which interact nicely 
with each other. In fact, this can be achieved within an existing language that 
has a sufficiently advanced type system, as we show in this paper. 


In this paper, we describe Qimaera (named after the hybrid creature Chi- 
maera from Greek mythology), which is a set of libraries for the Idris 2 lan- 
guage [10] that allow the programmer to implement hybrid quantum-classical 
algorithms in a type-safe way. Idris 2 is an elegant functional programming lan- 
guage that is equipped with an advanced type system based on Quantitative 
Type Theory [24,6] that brings many useful features to the programmer, most 
notably dependent types and linearity. These two features of Idris are crucial 
for the development of Qimaera and, in fact, are the reason we chose Idris in 
the first place. Dependent types are used throughout our entire development in 
order to correctly represent and formalise the compositional nature of quantum 
operations. Linearity is used in order to enforce the proper consumption of quan- 
tum resources (during execution) in a way that is admissible with respect to the 
laws of quantum mechanics. The combination of dependent types and linearity 
allows us to statically detect and reject erroneous quantum programs and this 
ensures the type safety of our approach to variational quantum programming. 


In our intended computational scenario, we have access to both a classical 
computer and a quantum computer. Since we cannot directly observe quantum 
information, we directly interact with the classical computer which sends instruc- 
tions to, and receives data from, the quantum device via a suitable interface that 
makes use of the IO monad. In our view, this is a representation of a (perhaps 
simple) computational environment for hybrid quantum-classical programming. 
We design a suitable (abstract) interface that allows us to model this situation 
accurately and which makes use of the IO monad. However, since the authors 
do not personally have any quantum hardware, we provide only one concrete 
implementation of our interface that simulates the relevant quantum operations 
on our classical computers by using the proper linear-algebraic formalism, but 
while still using the IO monad as prescribed by the abstract interface. From 
a high-level programming perspective, the abstract interface addresses the pro- 
gramming challenges induced by the classical-quantum device scenario, but it 
ignores lower-level considerations (e.g., error correction). 


Type-safe Quantum Programming in Idris 509 


We emphasise that we can achieve type-safe hybrid quantum-classical pro- 
gramming in an existing programming language by implementing suitable li- 
braries. This is important for variational quantum programming, because in 
most variational quantum algorithms, the classical part of the algorithm is con- 
siderably larger, more complicated and more difficult to implement, compared to 
the quantum part of the algorithm. Therefore, it is important for the program- 
ming language to have first-class support for classical programming features. We 
think our chosen language, Idris, is such a language. The advanced type system 
of Idris allows us to elegantly mix quantum and classical programming prim- 
itives and therefore allows us to achieve our objectives. We demonstrate that 
Qimaera is suitable for variational quantum programming by providing imple- 
mentations of the two most prominent variational quantum algorithms - QAOA 
and VQE. Moreover, our implementation of these algorithms has been achieved 
in a type-safe programming framework. By this we mean that common quan- 
tum programming errors (copying of qubits, applying a CNOT operation with 
the same source and target, etc.) are statically detected and rejected by the 
Idris type checker. We also note that being able to combine quantum and clas- 
sical programming is important in other scenarios too (for instance in quantum 
cryptography). 


Quantum Circuits vs Recursive Quantum Programs. We want to stress 
that the focus of our paper is not about quantum circuits, but about (recur- 
sive) quantum programs and algorithms. While some quantum algorithms may 
be seen as quantum circuits, there are algorithms which are more general, for 
example, repeat-until-success (see §5.2) and variational quantum algorithms (see 
§6). Such algorithms are not quantum circuits in the traditional understanding 
of this notion, and for them general recursion, probabilistic effects and classical 
computation might be important. 

More specifically, general recursion is important, because many existing quan- 
tum algorithms are probabilistic and find the correct answer with some proba- 
bility. General recursion then allows the programmer to repeatedly run such an 
algorithm until the correct solution is found, thereby resulting in an almost- 
surely-terminating program, i.e., a program that terminates with probability 1. 
However, since there is no upper bound on the number of runs of the algorithm, 
general recursion is necessary to express this pattern. For instance, this can be 
used to repeatedly run Shor’s algorithm until the algorithm succeeds in finding 
a divisor. This might also be useful for variational quantum algorithms, because 
it allows us to express more flexible termination conditions, which give us more 
than simple iterations. 


Safety Properties. We consider type safety in quantum programming to be 
important, because it is easy to make mistakes where one can copy qubits or 
forget to use a qubit. The former is physically inadmissible due to the no-cloning 
theorem of quantum mechanics [36] and the latter usually leads to unexpected 
behaviour, because discarding quantum information causes a side effect that 
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may affect the rest of the quantum system. These observations suggest that 
we may design our systems and libraries carefully, by utilising linear typing 
features, so that these situations can be statically detected and rejected by the 
type system, therefore avoiding the problem. Otherwise, such situations could 
result in runtime errors (e.g., copying a qubit), which are clearly undesirable. In 
fact, in our experience, it is very easy to make such mistakes and this happened 
while we were implementing some of the quantum algorithms described in this 
paper. Our type-safe approach to quantum programming automatically detects 
and rejects these kinds of erroneous programs during type checking. While we 
do not have any proof of correctness, we believe that our approach is type-safe 
as long as the users do not modify our library files. 


Why Idris instead of another language? The features that we require to 
achieve our objectives are: general recursion, dependent types and linearity. We 
chose Idris 2, because it is an excellent language that has all three of these fea- 
tures. Removing general recursion limits the expressivity of the language (as 
explained above). The other two features are used to reject erroneous quan- 
tum programs. We think that most programming languages that have the three 
features mentioned above are suitable for type-safe hybrid quantum-classical 
programming. In fact, one of the main points that we wish to demonstrate with 
this paper is that it is not necessary to build a standalone programming lan- 
guage in order to achieve the desired safety properties. Instead, the same can 
be achieved with already existing languages, such as Idris 2. This approach has 
some advantages (compared to designing a standalone language), such as: easier 
maintenance, larger library support, better integration with the newest develop- 
ments in classical programming, etc. 


2 Background on Quantum Computation 


Readers interested in a detailed introduction to quantum computing may consult 
[26]. In this section we summarise the basic notions that are relevant for our 
development. 

The simplest non-trivial quantum system is the quantum bit, often abbrevi- 
ated as qubit. Qubits may be thought of as the quantum counterparts of the bit 
from classical computation. A qubit |W) is represented as a normalised vector 


in C?. The computational basis is given by the pair of vectors |0) d (o) and 


|1) qf @ , which may be seen as representing the classical bits 0 and 1. An 
arbitrary qubit is described by |Y) = a|0)+b|1) where a,b € C and |a|?+|b|? = 1. 
A qubit may be in (uncountably) many different states, whereas a classical 
bit is either 0 or 1. When the linear combination |y} = a |0) +b |1) is non-trivial, 
then we say that |Y) is in superposition of |0) and |1). Superposition is a very 
important quantum resource which is used by many quantum algorithms. 
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H P(a) Í T 


U 


Fig. 1. The Hadamard, Phase Shift, CNOT and CU gates. 


The state space that describes a system of n qubits is the Hilbert space C?”. 


If |Y) and |¢) are two states of n and m qubits respectively, then the composite 


n +m qubit state |) = lY) @ |ġ) is described by the Kronecker product &® of 


the original states. 

A quantum state |y) € C?” may undergo a unitary evolution described by 
a unitary matrix U € C?"*?" in which case the new state of the system is de- 
scribed by the vector U |y). Unitary operations (and matrices) are closed under 
sequential composition (described by matrix multiplication o) and under parallel 
composition (described by Kronecker product ® ). Sequential composition of uni- 
tary operations is used to describe the temporal evolution of quantum systems, 
whereas the parallel composition is used to describe their spatial structure. 

The unitary quantum operations are also often called unitary gates. One 
typically chooses a universal gate set which is a small set of unitary operations 
that suffices to express all other unitary operations via (parallel and sequential) 
composition. The universal gate set that we choose for our development is stan- 
dard and we specify these unitary operations next by giving their action on the 
computational basis (which uniquely determines the operations). 

The Hadamard Gate, denoted H, is the 1-qubit unitary map whose action on 
the computational basis is given by H |0) = 5 (|0)+11)) and H |1) = z (0)—|1)) 
and its primary purpose is to generate superposition. The Phase Shift Gate, de- 
noted P(a), for a € R, is a 1-qubit unitary map whose action on the computa- 
tional basis is given by: P(a) |0) = |0) and P(q) |1) = et® |1) and its primary pur- 
pose is to modify the phase of a quantum state. The family of Phase Shift Gates 


is parameterised by the choice of a € R and important special cases include the 


unitary gates T 2 P(n/4) and Z a P(r). The Controlled-Not Gate (CNOT), 


is a 2-qubit unitary map whose action on the computational basis is given by 
CNOT |00) = |00) ; CNOT |01) = |01) ; CNOT |10} = |11) and CNOT |11) = |10) 
and this unitary map may be used to generate quantum entanglement. 

Unitary gates admit a diagrammatic representation as quantum circuits. The 
atomic unitary gates we described above are shown in Figure 1. Composite uni- 
tary gates may also be described as circuits (see Figure 2): sequential composition 
amounts to plugging wires of subdiagrams and parallel composition amounts to 
juxtaposition. 

The CNOT gate is the simplest example of a controlled unitary gate. Given 
a unitary gate U: C?” — C2", the controlled-U unitary gate is the unitary gate 
CU: C?” — C2""" whose action is determined by the assignments CU (0) ® 
|W)) = |0) ® |b) and CU(|1) ®@|w)) = |1) @(U |w)). Controlled unitary operations 
are ubiquitous in quantum computing (see Figure 1 for their circuit depiction). 
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Fig. 2. A quantum circuit that may be used for the preparation of the Bell state. 


Every unitary operation U is reversible with the inverse operation given by 
the conjugate transpose, denoted UT, which is again a unitary matrix. Applying 
the inverse operation (i.e., the adjoint) of a given unitary map is ubiquitous. 


A quantum state |) € C2", with n > 1, is said to be entangled when there 
exists no non-trivial decomposition |) = |} ® |T}. Quantum entanglement is a 
very important resource in quantum computation which is exhibited by many 
quantum algorithms. Because of the possibility of entanglement, we cannot, in 
general, break down quantum systems into smaller components and we are often 
forced to reason about such systems in their entirety. A very important example 
def |00)+|11) 
= 

Preparing a new qubit in state |0) is an admissible physical operation. This, 
together with application of unitary gates as part of the computation, allows 
us to prepare arbitrary quantum states, e.g., the Bell state can be prepared by 
taking |Bell) = (CNOT o (H & 1)) |00} (see Figure 2). 


of an entangled state is the Bell state given by |Bell) 


Quantum information cannot be directly observed without affecting the state 
of the underlying system. In order to extract information from quantum systems, 
we need to perform a quantum measurement on (parts of) our systems. For 
example, when performing a quantum measurement on a qubit in the state 
|v) = a|0) + b|1), there are two possible outcomes: either the quantum system 
will collapse to state |0) and we obtain the classical bit 0 as evidence of this event, 
or, the quantum system will collapse to state |1) and we obtain the classical bit 1 
as evidence of this event. The first outcome (corresponding to bit 0) occurs with 
probability |a|? and the second outcome (corresponding to bit 1) occurs with 
probability 1—|a|? = |b|?. In general, when we measure n qubits simultaneously, 
we obtain a bit string of length n which determines the event that occurred and 
the quantum system collapses to a corresponding state with some probability, 
both of which are determined via the Born rule of quantum mechanics. Therefore, 
quantum measurements induce evolutions which are probabilistic and irreversible 
(or destructive), which distinguishes them from unitary evolutions, which are 
deterministic and reversible. 


Unlike classical information, quantum information cannot be uniformly copied. 
This is made precise by the no-cloning theorem [36]. There exists no unitary op- 
eration U : C* > C4, such that for every qubit |) : U (IY) @ |0)) = |W) @ ly). 
This means that copying of quantum information is a physically inadmissible 
operation. Ideally, quantum programming languages should be designed so that 
these kinds of errors are detected during type checking. 
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3 Background on the Idris 2 Language 


In this section, we give a short overview of the Idris 2 language and its main 
features that are relevant for the development of Qimaera. Idris 2 is a functional 
language with a syntax influenced by that of Haskell. The features of particular 
interest for us are dependent types and linearity, both of which are crucial for 
Qimaera. Its type system is based on Quantitative Type Theory [24,6], which 
specifies how dependent types and linearity are combined. 


Dependent Types. In Idris, types are first-class primitives and they may be 
manipulated like other constructs of the language. This allows us to formulate 
more expressive types that can depend on values, and hence it enables us to 
make some properties and program invariants explicit. 


Example 1. The type of vectors is a simple and useful example of a dependent 
type. A vector is a list with a fixed length that is part of the type. It can be 
defined as follows, where S is the successor function for natural numbers, and a 
is a polymorphic type: 


data Vect : Nat -> Type -> Type where 
Nil : Vect Oa 
(::) : a -> Vect k a -> Vect (S k) a 


The type Vect has two constructors (i.e., introduction rules). The first one con- 
structs the empty vector, of length zero. The second one is used to introduce 
non-empty vectors: a vector with k+1 elements of type a is constructed by com- 
bining an element of type a and a vector of size K. 


Type dependency allows us to specify useful program properties and type 
checking ensures that they hold. For instance, we can define an append function 
that concatenates two vectors. Then, the size of the output vector is the sum of 
the sizes of the input vectors and this is specified by its type. 


append : Vect n a -> Vect m a -> Vect (n +m) a 


This information allows the language to detect a larger class of programming 
errors. Note that type dependency information is not available for the analogous 
function on lists. Type dependency may also be used to express constraints on 
the inputs of a function, e.g., we can define a total function, called pop, that 
cannot be applied to an empty vector. 


pop : Vect (S k) a -> Vect k a 
pop (x :: xs) = XS 


Writing “pop []” is now an error which is detected statically, rather than dy- 
namically, and we note that the same cannot be achieved if we were to replace 
vectors with lists. 
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Linearity. The type system of Idris 2 is based on Quantitative Type Theory, 
where every function argument is associated with a multiplicity that states the 
number of times the variable is used at runtime”. This multiplicity can be 0, 1 or 
w. An argument with multiplicity 0 is only used at compile time (to determine 
type dependency information) and is erased at runtime. A linear argument has 
multiplicity 1 and it is used exactly once at runtime. Finally, w represents the 
unrestricted multiplicity, which is default, where the function argument may be 
used any number of times. 


Example 2. Consider the pop function which we just discussed. The (implicitly 
bound) variables k and a have multiplicity 0, because they are not explicitly 
specified as separate arguments, and they are not accessible at runtime in the 
function. The variables x and xs, which are explicitly bound, have the default 
(unrestricted) multiplicity. 


Example 3. An important type which we define in Qimaera is the type of linear 
vectors, which we write as LVect. The only difference, compared to the standard 
vectors in Idris, is that the (::) constructor for LVect is a linear function in all 
of its arguments. Linearity in Idris 2 is specified by writing the multiplicity 1 in 
front of each argument. 


data LVect : Nat -> Type -> Type where 
Nil : LVect Oa 
C::) : (1 `: a) -> (1 _ : LVect k a) -> 
LVect (S k) a 


We also use linear pairs that are already defined in Idris 2. 


data LPair : Type -> Type -> Type 
(#) : (1 _ : a) -> (1 _ : b) -> LPair ab 


Linearity allows us to specify and enforce constraints on function arguments, 
e.g., it prevents us from duplicating data, so the function definition below leads 
to an error: 


copy : (1 _: a) -> LPairaa 
copy x = x # x 


Error: While processing right hand side of 
copy. There are 2 uses of linear name x. 


Linearity is prominently used in Qimaera. In particular, when manipulat- 
ing quantum data, linearity is enforced in order to properly handle quantum 
resources and comply with the laws of quantum mechanics. 


Remark 1. We learned only recently that there is a type of linear vectors in the 
Idris libraries. In the future we might replace our implementation with the one 
provided by the Idris developers. 


5 This can be understood similarly to how variables are used in linear -calculi. 
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data Unitary : Nat -> Type where 
IdGate : Unitary n 


H : (j : Nat) -> 
{auto prf : (j < n) = True} -> 
Unitary n -> Unitary n 

P : (p : Double) -> (j : Nat) -> 
{auto prf : (j < n) = True} -> 
Unitary n -> Unitary n 

CNOT : (c : Nat) -> (t : Nat) -> 
{auto prf1i : (c < n) = True} -> 


{auto prf2 : (t < n) = True} -> 
{auto prf3 : (c /= t) = True} -> 
Unitary n -> Unitary n 


Fig. 3. The Unitary data type (file: Unitary. idr). 


4 Unitary Operations in Qimaera 


We describe our representation of unitary transformations in Qimaera as an 
algebraic data type called Unitary. Every value of this type is, by design, an 
algebraic decomposition of a unitary operation in terms of the atomic unitary 
gates that we selected in §2. 

The Unitary data type allows us to adopt a high-level algebraic and scalable 
approach towards the reversible fragment of quantum computation. This pro- 
vides the programmer with some benefits as we show in this section. However, 
using the Unitary data type is actually entirely optional. Users who are inter- 
ested in effectful quantum programming do not have to use it (see §5) and they 
may still do hybrid classical-quantum programming, but at the cost of losing the 
algebraic decomposition of unitary operations. However, there are many useful 
functions that are available for manipulating values of type Unitary that are 
not available for effectful quantum programs. 


4.1 The Unitary Data Type 


Quantum unitary operations admit an algebraic representation based on the 
atomic gates from the universal gate set we described. Our idea for the repre- 
sentation of unitary operations is based on this, or equivalently, on how unitary 
operations may be expressed in terms of unitary quantum circuit diagrams. Be- 
cause of these reasons, linearity is not required for our formalisation of unitary 
operations. The code for the Unitary data type is listed in Figure 3 and we now 
describe our representation in greater detail. 

Given a natural number n : Nat, the type of unitary operations on n qubits 
is given by Unitary n. Note that Unitary is an algebraic data type with a simple 
type dependency on the arity of the desired operation. The Unitary type has 
four different introduction rules which we describe next. 
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The first constructor, IdGate, represents the identity unitary operation on 
n qubits. Diagramatically, we can see this as constructing a circuit of n wires, 
without applying any other gates on any of the wires. It has a unique argument, 
n, which is implicit — it can be omitted when calling the IdGate constructor and 
it will often be inferred by Idris. 

The second constructor, H, should be understood as applying the Hadamard 
gate H to the j-th qubit of some previously constructed unitary circuit which is 
specified as the last argument. The first implicit argument, n, is simply the arity 
of the resulting unitary operation. The second implicit argument, prf, is a proof 
obligation that j is smaller than n. This ensures that the argument j identifies an 
existing wire of the previously constructed unitary circuit (last argument) and 
therefore the overall definition is algebraically and physically sound. We think 
that the implicit argument prf may be removed from our implementation if we 
change the type of j to Fin n, the type of natural numbers less than n. However, 
in our experience, we found it easier to work with the current implementation 
rather than with Fin and for this reason we chose to keep the prf argument. 

The third constructor, P, should be viewed as applying the P(p) gate, where 
the real number p € R is approximated by the term p : Double.° The remaining 
arguments serve the same purpose as those for H. 

The final constructor, CNOT, should be understood as applying the CNOT 
gate, where c identifies the wire used for the control (the small black dot in Figure 
1), t identifies the wire of the target (the crossed circle in Figure 1) and the last 
(unnamed) argument is the previously constructed unitary circuit on which we 
are applying CNOT. The remaining arguments are implicit: the argument n is 
the arity of the unitary; prf1 and prf2 ensure that c and t identify valid wires 
of the unitary circuit; prf3 ensures that the control and target wires are distinct 
and therefore the overall application of CNOT is physically and algebraically 
admissible. 

In our representation of quantum unitary operations, we make use of type 
dependency to impose proof obligations on some of our constructors in order to 
guarantee that the representation makes sense in physical and algebraic terms. 
Indeed, this might sometimes be a burden for the users of the library. However, 
Idris can sometimes automatically infer the required proofs without any assis- 
tance from the user, e.g., when all arguments are statically known constants (see 
Example 4). This is discussed in detail in the next subsection. 


4.2 Constructing Unitary Transformations 


The four basic introduction rules of the Unitary type allow us to define high- 
level functions in Idris that can be used to construct complex unitary circuits 
out of simpler ones. We discuss this here and we show that the proof obligations 


6 This approximation is not a big limitation — in fault-tolerant quantum computing 
one usually replaces the P(p) gate family with a single T = P(7/4) gate and the 
resulting gate set suffices to achieve approximation with arbitrary precision. So we 
can easily replace P with a T constructor. 
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from Figure 3 can sometimes be ameliorated and sometimes even completely 
sidestepped. 

First, we point out that auto-implicit arguments may occasionally be inferred 
by Idris via suitable search. For example, if all the arguments are known stati- 
cally, the required proofs will often be discovered by Idris and then the users do 
not have to manually provide them. 


Example 4. The unitary circuit from Figure 2 may be constructed in the follow- 
ing way: 

toBellBasis : Unitary 2 

toBellBasis = CNOT O 1 (H O IdGate) 


In this example, Idris is able to infer all the implicit arguments and there is no 
need to provide any proofs. If we do not satisfy one of the constraints, e.g., if 
we write CNOT 1 1 above (which does not make physical sense), then we get the 
following error during type checking: 


Error : While processing right hand side of 
toBellBasis. Can’t find an implementation for 
not (== 1 1) = True. 


An error also is reported if we provide a wire number larger than 1. It also is 
useful to define standalone unitary gates for the H, P(r) and CNOT gates as 
follows: 


HGate : Unitary 1 
HGate = H O IdGate 


PGate : Double -> Unitary 1 
PGate r = P r O IdGate 


CNOTGate : Unitary 2 
CNOTGate = CNOT O 1 IdGate 


Composing Unitary Circuits. Our libraries provide functions for sequential 
composition (compose) and parallel composition (tensor) of unitary operations: 


compose : Unitary n -> Unitary n -> Unitary n 
tensor : {n : Nat} -> {p : Nat} -> Unitary n 
-> Unitary p -> Unitary (n + p) 


Notice that both functions do not require proof obligations like the ones from 
Figure 3. This means that one of the main algebraic ways for composing unitary 
operations may be done without requiring such proofs. The use of these functions 
is ubiquitous in practice and we introduce the infix synonyms (.) and (#) for 
compose and tensor, respectively. 


Example 5. The toBellBasis gate from Example 4 may be equivalently ex- 
pressed in the following way: 
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toBellBasis : Unitary 2 
toBellBasis = CNOTGate . (HGate # IdGate) 


Qimaera provides another, more general, form of composition via the function 
apply whose type is as follows: 
apply : {i : Nat} -> {n : Nat} -> 
Unitary i -> Unitary n -> 
(v : Vect i Nat) -> 
{auto _ isInjective n v = True} -> 
Unitary n 


The apply function is used to apply a smaller unitary circuit of size i to a bigger 
one of size n, giving the vector v of wire indices on which we wish to apply the 
smaller circuit. It needs one auto-implicit proof which enforces the consistency 
requirement that all indices of the wires specified by v are pairwise distinct and 
smaller than n. In fact, the apply function implements the most general notion 
of composition that we support. Both sequential and parallel composition can 
be realised as special cases using it. The importance of the vector v is that it 
determines how to apply the smaller unitary circuit of arity i to any selection of i 
wires of the larger unitary circuit, and moreover, it also allows us to permute the 
inputs/outputs of the smaller unitary circuit while doing so. More specifically, if 
the k-th entry of the vector v is the natural number p, then the k-th input/output 
of the smaller unitary circuit will be applied to the p-th wire of the larger unitary 
circuit. This is best understood by example. 


Example 6. Consider the following code sample: 

U : Unitary 3 

U = HGate # IdGate {n = 1} # (PGate pi) 
apply_example : Unitary 3 

apply_example = apply toBellBasis U v 


where v is a vector of length two. Here, toBellBasis is given in Example 4 and 
represents the circuit given below left; U represents the circuit given below right: 


co —— 


P(r) 


Table 1 shows what unitary circuit is specified under different values of v. In 
these cases, Idris can automatically infer the required proofs and the user does 
not have to provide them. 


Remark 2. Instead of using apply, there is another possible approach, in the 
spirit of symmetric monoidal categories |23, §XI], where we could add one extra 
introduction rule to the Unitary type for representing permutations of wires. 
However, in our view, this approach is less appealing, because one does not 
usually think of permutations (induced by the symmetric monoidal structure) 
as physical gates. 
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i H |H H |e 
apply toBellBasis U [0,1] A 
D 
P(x) 

; H |H H |— e 

apply toBellBasis U [0,2] 
P(x) © 
: H p 

apply toBellBasis U [2,0] 
P(r) H H —è 

H 

apply toBellBasis U [2,1] = 
D 
P(r) H H —è 


Table 1. Examples illustrating the apply function. 


Adjoints of Unitary Circuits. Qimaera also provides a function 
adjoint : Unitary n -> Unitary n 


which computes the adjoint (i.e., inverse) of a given unitary circuit. One often has 
to apply the inverse of a given unitary circuit, so having a method such as this 
one is useful. Our implementation uses the standard approach for synthesising 
the adjoint. The adjoint may be used, for example, to uncompute the result of 
the application of unitary gates on auxiliary qubits. 


Controlled Unitary Circuits. We also implement a function 
controlled : {n : Nat} -> Unitary n -> Unitary (S n) 


which given a unitary circuit U constructs the corresponding controlled unitary 
circuit CU. Our implementation uses the standard and simple algorithm for 
doing this, but more efficient algorithms may also be implemented in principle. 


Analysis of Unitary Circuits. Unitary circuits are represented in a scal- 
able way in Qimaera and we can use Idris to optimise them. In particular, the 
function: 


optimise : Unitary n -> Unitary n 


may be used to optimise a given unitary circuit by reducing the number of 
gates while keeping the action of the circuit unchanged. So far, this function 
provides only very basic optimisations, but more sophisticated and powerful 
ones may be added in principle. The point we wish to make is that unitary 
circuits in Qimaera may be analysed and manipulated like other algebraic data 
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Fig. 4. The QFT unitary circuit on n qubits. 


type structures using the capabilities of Idris. In fact, the file Unitary.idr also 
provides other functions that do this. For example, we provide functions for 
calculating the circuit depth, calculating the number of specific atomic gates 
used by a circuit, drawing circuits in the terminal and exporting circuits to 
Qiskit so that users may then use external analysis tools. 


4.3 Example: The Quantum Fourier Transform 


The Quantum Fourier Transform (QFT) is an important unitary operator that 
is used in Shor’s polynomial-time algorithm for integer factorisation [34]. The 


unitary circuit which realises QFT on n qubits is shown in Figure 4, where 


Rn 2 P (32) . The Qimaera code which implements this unitary circuit is shown 


in Figure 5. Notice that we make use of the controlled function from §4.2 in 
the function cRm, so that we can implement the controlled R,, gates that are 
required. In this example, we have parameters that are universally quantified, 
so we need a few proofs in the code: one for using the apply function and one 
for correctly unifying the size of the circuit. These proof obligations appear 
when writing the qftRec function and Idris did not infer them automatically, 
so we had to provide the proofs. To get some intuition for the code: the qftRec 
function computes the recursive pattern that applies a Hadamard gate followed 
by the cascade of controlled Rn gates; the qft function then computes the other 
recursive pattern which consists in repeatedly using the pattern computed by 
gftRec and composing as appropriate. 


5 Effectful Quantum Computation 


In the previous section we showed how unitary circuits can be represented in 
Qimaera. This suffices to capture the pure, deterministic and reversible frag- 
ment of quantum computation. However, we need to also consider effectful and 
probabilistic quantum processes which may result from quantum measurements, 
because this is important for hybrid quantum-classical computation. In this sec- 
tion, we show how this can be done in a type-safe way by using monads, linearity 
and dependent types. 
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Rm : Nat -> Unitary 1 
Rm m = PGate (2 * pi / (pow 2 (cast m))) 


cRm : Nat -> Unitary 2 
cRm m = controlled (Rm m) 


qftRec : (n : Nat) -> Unitary n 

qftRec 0 = IdGate 

qftRec 1 = HGate 

qftRec (S (S k)) = 
let t = (qftRec (S k)) # IdGate 
in rewrite sym $ lemmaplusOneRight k 
in apply (cRm (S (S k))) t [S k,0] 

{prf = lemmaInji k} 


qft : (n : Nat) -> Unitary n 
qft 0 = IdGate 
qft (S k) = 
let g = qftRec (S k) 
h (IdGate {n = 1}) # (qft k) 
in h. g 


Fig. 5. Qimaera code for QFT (file: QFT. idr). 


5.1 Representation of Quantum Effects in Qimaera 


We now explain how the quantum program dynamics are represented in Qimaera 
in a type-safe way. We are (roughly) inspired by representing the notion of 
a quantum configuration as it appears in [32,29,22], which is in turn used to 
formally describe the operational semantics of quantum type systems. 


Qubits in Qimaera. Because of the possibility of quantum entanglement, we 
cannot describe the state of an individual qubit which is part of a larger com- 
posite system. On the other hand, we wish to be able to refer to parts of the 
whole system by identifying specific qubit positions. In Qimaera, we introduce 
the following type declaration: 


data Qubit : Type where 
MkQubit : (n : Nat) -> Qubit 


The argument of type Nat is used as a unique identifier for the constructed qubit. 
The constructor MkQubit is private and users of our libraries cannot access it 
(outside of the library file). Instead, our libraries provide functions (Figure 7) 
that ensure that a term of type Qubit is created with a fresh (i.e., unique) 
natural number that serves as its identifier within a monadic environment. This 
is handled by our functions through careful manipulation of the available data 
within the monadic environment. In fact, these functions are the expected way 
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for our users to access or manipulate qubits and, moreover, our users cannot 
access the unique identifiers (unless they modify our libraries). This allows us 
to formulate a representation where values of type Qubit unambiguously refer 
to the relevant parts of larger composite systems. Therefore, a value of type 
Qubit should be understood as a pointer, or as a unique identifier, of a 1-qubit 
subsystem of some larger quantum state. Terms of type Qubit do not carry any 
sort of linear-algebraic information. 


Probabilistic Effects. Quantum measurements induce probabilistic computa- 
tional effects which are inherited by the classical side of the computation in hy- 
brid classical-quantum algorithms. Furthermore, in our intended computational 
scenario, the classical computer (on which Idris is running) sends instructions 
to, and receives data from, the quantum device. In order to correctly model all 
of this, it is clear that we have to use the IO monad in order to encapsulate 
these effects. However, when representing quantum program dynamics, we also 
need to enforce linearity, but all the functions provided by the IO monad (e.g., 
pure which introduces pure values to monadic types) are not linear in any of 
their arguments. This creates a problem which may be solved by using the LIO 
library, which extends the IO monad with linearity. For brevity, we define R to 
be our linear IO monad: 


R : Type -> Type 
R = L IO {use = Linear} 


Then, by using R we can combine IO effects (and thus also probabilistic effects) 
and linearity in a suitable way. 


Quantum State Transformer. Quantum computation is effectful, and more- 
over, quantum information cannot be observed by the classical computer (on 
which Idris is running): it only receives classical information through communi- 
cation with the quantum device. Because of this, we adopt a more abstract view 
on the hybrid classical-quantum computational process. In order to do this, we 
define an (abstract) quantum state transformer by combining several different 
concepts: indexed state monads [4]", linearity and IO (and thus also probabilis- 
tic) effects. Our representation of these ideas in Qimaera is shown in Figure 6, 
where we omit the function definitions for brevity. 

The type QStateT is parameterised by a choice of three (arbitrary) types, so 
it is fairly abstract. Soon, we will see that it is very useful for our purposes. The 
intended interpretation of this type is the following: any value of type 


QStateT initialType finalType returnType 


represents a stateful (quantum) computation starting from a (quantum) state 
of type initialType and ending in a (quantum) state of type finalType which 


T See [33] for a Haskell implementation of this idea. 
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data QStateT : Type -> Type -> Type -> Type where 
MkQST : (1 _ : (4 _ : initialType) -> 
R (LPair finalType returnType)) -> 
QStateT initialType finalType returnType 


runQStateT : (1 _ : initialType) -> 
(1 _ : QStateT initialType finalType returnType) -> 
R (LPair finalType returnType) 


pure : (1 _ a) -> QStateT t t a 
(>>=) : (1 _ : QStateT i m a) -> 
(1 _ : ((1 _ : a) -> QStateT m o b)) -> 


QStateT i o b 


Fig. 6. Quantum state transformer (file: QStateT. idr). 


produces a user-accessible result of type returnType during the computation. 
For example, a value of type 


QStateT (LPair Qubit Qubit) Qubit Bool 


should be understood as a quantum process that transforms a two-qubit state 
into a single-qubit state and returns a single (classical) value of type Bool to 
the user. The functions presented in Figure 6 allow us to adopt a monadic 
programming discipline when working with QStateT and we do so henceforth. 
We remark that QStateT makes use of the monad R which encapsulates the 
IO (and probabilistic) effects and that linearity is enforced when working with 
QStateT. 


Effectful Quantum Programming. The QStateT monad can be used to 
define a suitable abstract interface for quantum programming. In Figure 7, we 
present an excerpt of the QuantumOp interface which allows us to write quantum 
programs and execute them in a type-safe way. All of the hybrid quantum- 
classical algorithms we present are implemented using this interface. 

The function newQubits is used to prepare p new qubits in state |0) and the 
function returns a linear vector of length p with the qubit identifiers of the newly 
created qubits. The function applyUnitary is used to apply a unitary operation 
of arity i to the qubits specified by the argument LVect (which also determines 
the order of application) and the operation returns an LVect which serves the 
same purpose — it identifies the qubits which were just modified by the unitary 
operator. The file QuantumOp.idr also provides functions applyH, applyP and 
applyCNOT which can be seen as special cases of applyUnitary. However, these 
three functions do not depend on the Unitary type. 

The measure function is used to measure i qubits identified by the LVect 
argument and it returns a value of type Vect i Bool that represents the result 
of the measurement. After this, the i measured qubits are not reused, as one 
can see from the provided type information. 
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interface QuantumOp (0 t : Nat -> Type) where 
newQubits : (p : Nat) -> QStateT (t n) (t (ntp)) (LVect p Qubit) 


newQubit : QStateT (t n) (t (S n)) Qubit 


Unitary i -> QStateT (t n) (t n) (LVect i Qubit) 


applyUnitary : {n : Nat} -> {i : Nat} -> (1 : LVect i Qubit) -> 


applyH : {n : Nat} -> (1 _ : Qubit) -> QStateT (t n) (t n) Qubit 


applyP : {n : Nat} -> Double -> (1 _ : Qubit) -> 
QStateT (t n) (t n) Qubit 


applyCNOT : {n : Nat} -> (1 _ : Qubit) -> (1 _ : Qubit) -> 
QStateT (t n) (t n) (LPair Qubit Qubit) 


measure : {n : Nat} -> {i : Nat} -> (1 _ : LVect i Qubit) -> 
QStateT (t (i + n)) (t n) (Vect i Bool) 


measureQubit : {n : Nat} -> (1 _ : Qubit) -> 
QStateT (t (S n)) (t n) Bool 


measureAll : {n : Nat} -> (1 _ : LVect n Qubit) -> 
QStateT (t n) (t 0) (Vect n Bool) 


run : QStateT (t 0) (t 0) (Vect n Bool) -> IO (Vect n Bool) 


Fig. 7. The QuantumOp interface (file: QuantumOp. idr). 


Finally, the function run is used to execute quantum algorithms on the quan- 
tum device and obtain the classical information returned from it. Notice that 
run can be used to execute effectful quantum processes which start from the 
trivial quantum state (on zero qubits) and which terminate in the same triv- 
ial quantum state, but which also produce some number of classical bits as a 
user-accessible return result. This may be used to run quantum algorithms: in 
a typical situation, we start with the trivial quantum state (on zero qubits), we 
prepare n qubits in state |0), we apply some unitary operations on them, and we 
finally measure all the qubits, thereby producing n bits of classical information. 
This quantum algorithm may then be represented as a value of type QStateT 
(t 0) (t 0) (Vect n Bool). Running it, however, produces a classical value 
of type IO (Vect n Bool), because the execution is probabilistic and because 
our classical computer (on which we are running Idris) has to perform IO actions 
to communicate with the quantum device. 


In fact, all of the above operations modify the quantum state on the quantum 
device and may cause IO effects, because of the need to communicate with the 
quantum device. This is indeed reflected by our interface. Observe, that our 
interface is defined using the QStateT monad transformer which does incorporate 
IO effects (via the R monad we discussed previously). 
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Example 7. A fair coin toss may be implemented using quantum resources. The 
process is simple: (1) prepare the state |0); (2) apply the H gate to it; (3) measure 
the qubit and return this as output. We implement this as follows: 


coin : Quantum0p t => IO Bool 
coin = do 
[b] <- run (do 
q <- newQubit {t = t} 
q <- applyH q 
r <- measure [q] 
pure r 
) 
pure b 


The top-level do block simply realises monadic sequencing for the standard 
IO monad. The do block within the run environment is more interesting and 
crucial for our development. It performs monadic sequencing for the QStateT 
monad and it represents the simple three-step algorithm we just described. The 
call to the run function executes this algorithm and users obtain the produced 
classical information by storing it in the variable b of type Bool. We emphasise 
that linearity is enforced within the run environment and this is what brings 
safety properties in our approach, e.g., all of the following scenarios are statically 
detected and rejected by Idris: passing the qubit q to a non-linear function, 
copying the qubit q, forgetting to measure the qubit q. For example, if in the 
above code we replace the last two statements in the run environment with 
“pure True”, then Idris statically detects this error. 


The function coin from Example 7 is implemented using our abstract inter- 
face. This means we can use this function in any concrete implementation of 
the QuantumOp interface. Since the authors do not have any quantum hardware, 
we provide one concrete implementation of this interface, called SimulatedOp, 
which performs linear-algebraic simulation of all the required operations. For 
example, if we wish to use the coin function, then the code: 


testCoin : IO Bool 
testCoin = coin {t = Simulated0Op} 


defines a new function, called testCoin, which does the same as coin, but it 
specifically instructs Idris to use linear-algebraic simulation. We emphasise that 
all of our quantum algorithms are written using our abstract interface, so there 
is no need to reimplement them for any additional concrete implementations of 
the interface. 


5.2 Example: Repeat-Until-Success Algorithm 


Repeat-until-success (RUS) [27] is an algorithm for implementing quantum uni- 
tary operators by using quantum measurements and general unbounded recur- 
sion. The main advantage in using RUS over traditional deterministic techniques 
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RUS : QuantumOp t => (1 _ : Qubit) -> 
(u? : Unitary 2) -> (e : Unitary 1) -> 
QStateT (t 1) (t 1) Qubit 
RUS q u’ e = do 
q’ <- newQubit 
[q’?,q] <- applyUnitary [q’,q] uw’ 
b <- measureQubit q’ 
if b then do 
[q] <- applyUnitary [q] (adjoint e) 
RUS q u?’ e 
else pure q 


example_u’ : Unitary 2 
example_u’ = H O $ T O $ CNOT 0 1 $ H O $ CNOT 0 1$TO $ 
H O IdGate 


runRUS : QuantumOp t => IO Bool 
runRUS = do 
[b] <- run (do 
q <- newQubit {t = t} 
q <- RUS q example_u’ IdGate 
measure [q] 
) 
pure b 


testRUS : IO Bool 
testRUS = runRUS {t = SimulatedOp} 


Fig. 8. Repeat-until-success algorithm (file: RUS. idr). 


that synthesise unitary operators, is that with RUS the expected number of T 
gates (which are expensive in terms of error correction®) can be reduced. 

In the simplest case, we wish to realise a fixed single-qubit unitary operator 
U : C? + C?. The RUS algorithm is as follows. Given an input qubit |y) , then: 
(1) prepare a new qubit in state |0); (2) apply a two-qubit unitary operator 
U’ (chosen in advance depending on U); (3) measure the first qubit; (4) if the 
measurement outcome is 0 (which occurs with probability p > 0), then the 
output state is U |Y), as required, and the algorithm terminates; otherwise the 
current state is E |Y}, where E is some other unitary operator (chosen in advance 
depending on U), so we apply Et to this state and we go back to step (1). The 
unitary operators U’ and E are chosen in advance, depending on U, before the 
algorithm starts so that the above conditions are satisfied. Note that synthesising 
U' and E is not part of the algorithm and we do not discuss this here. 

Assuming that appropriate U’ and E are chosen, this process always termi- 
nates in state U |y) (provided p > 0) so RUS indeed implements the unitary 
operator U. Note that this is an algorithmic realisation of U, not an algebraic 
one, and so we cannot write a program of type Unitary that achieves this. In- 
stead, we represent this as a quantum program in Figure 8. There, RUS q u’ 


8 We do not automatically implement error correction, so it has to be handled either 
by the developer or provided by the quantum device on the remote end. 
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e is the quantum state transformer which implements the RUS algorithm as 
above. The function runRUS simply executes the RUS algorithm on a qubit in 
state |0), with the unitary operator chosen from [27, Figure 8], then measures 
the qubit and returns the outcome. Both of these functions are written using 
our abstract interface. The function testRUS is the same as runRUS, but it also 
instructs Idris to use linear-algebraic simulation for the execution. Note that, in 
our implementation, we have taken a specific instance of RUS by choosing U’ to 
be the unitary operator described by example_u’ as discussed in [27, Figure 8]. 


Remark 3. The run(-) environment enforces linearity, so if we wish to use the 
RUS function within it, then the qubit argument must be linear in RUS. 


6 Variational Quantum Programming 


In the previous section we saw that Qimaera is suitable for writing recursive and 
effectful quantum programs that make use of quantum measurements. Moreover, 
Idris 2 is an excellent programming language with an advanced type system 
and first-class support for classical programming features. In order to demon- 
strate that Qimaera is suitable for hybrid classical-quantum programming, we 
also have to show that both classical and quantum programming features may 
be elegantly combined. This is the purpose of this section and we achieve this 
by implementing the two most prominent variational quantum algorithms: the 
Quantum Approximate Optimization Algorithm (QAOA) [13] and the Varia- 
tional Quantum Eigensolver (VQE) [30]. In this paper we only describe QAOA. 
See the full paper [12] for more information on the implementation of VQE. 
The objective of QAOA is to try to find the minimum (or maximum) eigen- 
value of a Hamiltonian. A Hamiltonian is a Hermitian (i.e., self-adjoint) matrix 
H (we use a calligraphic font to differentiate it from H, the Hadamard matrix). 
Its minimum eigenvalue is the minimum (real) value A such that H |Y) = A |Y) 


for some nonzero vector |Y}. As H is unitarily diagonalizable, this is equivalent 


to the minimum of (y| H |) for all vectors |Y} of norm 1, where (4| = |b)". 


QAOA starts with some assumption on what the vector |W) looks like and 
usually |w) is prepared by a quantum circuit that depends on some real param- 
eters Q),...,Q@). By measuring this state |), one obtains some information on 
the value of (| H |). This information can then be fed to a classical optimizer 
to change the value of the parameters aj,...,Q , for subsequent execution. 

This classical-quantum back and forth is repeated until some satisfactory 
termination condition has been satisfied. For example, we may simply repeat 
this process k times, where k € N is some constant, but more sophisticated 
termination conditions are also possible. However, there is no guarantee that we 
will find the minimum eigenvalue. 


Implementation of QAOA. QAOA is a variational algorithm [13] that ap- 
proximately solves optimization problems. Let f : {0,1}” — R be a function for 
which we want to find its minimum. We see f as a diagonal Hamiltonian over n 
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qubits defined by H |x) = f(a) |x) for all x € {0,1}". We are therefore searching 
for the minimum eigenvalue of this Hamiltonian. 

In this case, the state |y) that minimises the Hamiltonian H is often assumed 
to be of the form: |Y) = (H P(By)H)®"e%* ..- (H P(81)H)8”e HS” |0) . The 
depth parameter p € N is usually fixed to be small, and we have a guarantee that 
the results of our algorithm become better when p becomes larger. To be able 
to produce a circuit which computes |), the Hamiltonian H may be assumed 
to have a special form so that we can make a circuit for e7™%. A well-known and 
important example is to compute the maximum cut of an undirected graph, i.e., 
to solve the MAXCUT problem. 

Our implementation for QAOA on the MAXCUT problem is presented in 
the file QAQA.idr and an excerpt is shown in Figure 9. The problem depends 
on the graph G for which we want the maximum cut, a depth parameter p, and 
some real parameters ĝi, Yi- 

In our implementation, we have a function QAOA_Unitary, that takes these 
parameters as input and produces a unitary circuit that may be used to pre- 
pare the state |q) when applied to the initial state |0)°”. We then measure 
this state |W) and present the result (a cut of the graph in the obvious bi- 
nary encoding) to an optimiser. Our optimiser is implemented by the function 
classical0ptimisation that uses all observable information from all previous 
runs (which amounts to the values of the parameters ĝi, yi and the value of 
the cuts that have been previously obtained through quantum measurements) 
to compute the subsequent rotation parameters (;,7; that we will use for the 
next iteration. The type of this function indicates that it uses the IO monad: 
this is because we wish to allow the function to use probabilistic optimisation 
algorithms or even external tools. One of the simplest implementations of this 
function chooses the rotation parameters at random. 

The interplay between the classical and the quantum part is presented in 
Figure 9. The function QAOA takes as input a natural number k representing 
how many times the whole routine will be done, the depth p of the circuit, and 
the graph G on which to compute the cut. Notice that the call to the quantum 
device is isolated inside the run function. 


7 Related Work 


In this section we compare Qimaera with other existing quantum programming 
languages that are implemented in software. We omit comparisons with quantum 
type systems that do not have a software implementation. We provide a feature 
comparison with some quantum programming languages in Table 2 and we now 
clarify the meaning of some of the selected features. 

By Type Safety we mean that the language can statically detect (and re- 
ject) erroneous programs which duplicate quantum data. General Recursion is 
the ability to express recursive (possibly non-terminating) programs and almost- 
surely-terminating programs, such as RUS (see §5.2). Measurements is the ability 
to use the outcomes of quantum measurements in the control flow of programs. 
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QAOA_Unitary : {n : Nat} -> (betas : Vect p Double) 
-> (gammas : Vect p Double) 
-> (graph: Graph n) -> Unitary n 


classical0ptimisation : {p : Nat} 
-> (graph : Graph n) 
-> (previous_info : Vect k (Vect p Double, 
Vect p Double, Cut n)) 
-> IO (Vect p Double, Vect p Double) 


QAOA?’ : QuantumOp t => 
{n : Nat} -> 
(k : Nat) -> (p : Nat) -> (graph : Graph n) -> 
IO (Vect k (Vect p Double, Vect p Double, Cut n)) 
QAOA?’ O p graph = pure [] 
QAOA?’ (S k) p graph = do 
previous_info <- QAOA’ {t} k p graph 
(betas, gammas) <- classical0ptimisation graph previous_info 
let circuit = QAOA_Unitary betas gammas graph 
cut <- run (do 
qs <- newQubits {t} n 
qs <- applyUnitary qs circuit 
measureAll qs 
) 


pure $ (betas, gammas, cut) :: previous_info 


QAOA : QuantumOp t => {n : Nat} -> (k : Nat) -> (p : Nat) -> 
Graph n -> IO (Cut n) 
QAOA k p graph = do 
res <- QAOA’ {t} k p graph 
let cuts = map (\(_, _, cut) => cut) res 
let (cut,size) = bestCut graph cuts 
pure cut 


Fig. 9. Qimaera implementation (excerpt) for the QAOA algorithm solving the MAX- 
CUT problem. 
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Promotion of Measurements is the ability to integrate the outcomes of quan- 
tum measurements as a native classical type (e.g., Bool): this essentially allows 
us to switch from a quantum mode of operation into a classical one and al- 
lows us to use both quantum and classical programming paradigms; it may be 
roughly understood as corresponding to the promotion rule of linear logic [16]. 
For Higher-order Functions we distinguish between purely classical ones and 
mixed classical-quantum (in the second column); some languages support both, 
but treat the quantum ones non-linearly which may cause loss of type safety. Fi- 
nally, by Effects we mean the ability to incorporate probabilistic computational 
effects (which are an essential part of the dynamics of hybrid classical-quantum 
programs) and also IO (input/output) effects into our programming workflow. 

The QWIRE language [28,31] and the SQIR language [20,19] are quantum 
circuit languages that are embedded in the Coq proof assistant [11]. Both of these 
languages have access to dependent types, courtesy of Coq. The focus of these 
languages is mostly on verification, whereas in Qimaera we focus on programming 
and Idris 2 has better support for classical, quantum and effectful programming 
features compared to Coq. Both QWIRE and SQIR represent quantum primi- 
tives through the use of low-level specification languages that are embedded in 
Coq: both of these specification languages lack the ability to express quantum 
algorithms that require general recursion and both of them lack the ability to 
express quantum higher-order functions. Because of the former reason, the RUS 
algorithm from §5.2 cannot be expressed in QWIRE or SQIR. 

Silq [9] is a standalone quantum programming language which also is type- 
safe and whose main notable feature is automatic uncomputation of temporary 
values. We currently partially support this feature, because we have clearly iden- 
tified and separated the reversible fragment of quantum computation (see the 
Unitary type) and we can synthesise the required adjoints by calling the adjoint 
function. Compared to Silq, the main advantage of Qimaera is that Idris has bet- 
ter support for classical programming features and so we believe that Qimaera 
is a better choice for hybrid classical-quantum programming. In addition, Silq 
does not support general recursion, so it cannot express quantum algorithms 
that rely on this (e.g., RUS §5.2). 


Type |General [Dependent Measure- Promotion |Higher-order Functions 
Language Safety Recursion| Types ments of Measure: Effects 
ments Classical| Quantum 
Quipper x Vv x v v v (non-linear) v 
Proto-Quipper-D v v v x x v v x 
Proto-Quipper-Dyn| v v x v v v v v 
QWIRE v x v v x v x x 
SQIR v x v v x v x x 
Silq Vv x (limitted) v v v v x 
Qiskit x v x Vv vV v (non-linear) v 
Q# x v x v v v (non-linear) vV 
Cirq x v x v v v (non-linear) v 
Qimaera v v v v v v v v 


Table 2. Feature comparison between Qimaera and other languages. 
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Quipper [18] and the Quantum IO monad (QIO) [3] are two domain specific 
languages (DSLs) embedded in Haskell. Neither of them are type safe because 
they do not utilise linearity and they cannot statically detect quantum programs 
that are physically inadmissible. However, thanks to the language similarities 
between Haskell and Idris, the programming style in these languages is somewhat 
similar to ours (e.g., all three use monads). In our view, both of these papers have 
been influential for the design of functional quantum programming languages. 

Another recent language includes Proto-Quipper-D [14] which is a type-safe 
circuit description language. This language is based on a novel type system which 
shows how linearity and dependent types can be combined. A fundamental dif- 
ference between Proto-Quipper-D and Qimaera is that linearity is the default 
mode of operation in Proto-Quipper-D, whereas in Qimaera the default mode 
is non-linear. The focus in Proto-Quipper-D is on circuit description and gen- 
eration and the language currently lacks effectful quantum measurements and 
probabilistic effects, so it cannot be used for variational quantum programming 
at present. Another related language is Proto-Quipper-Dyn [15]. It is similar to 
Proto-Quipper-D, but it lacks dependent types (which Qimaera has). On the 
other hand, it can handle quantum measurements and has dynamic lifting, i.e., 
the ability to parameterize quantum circuits based on information observed from 
quantum measurements. Note that Qimaera also has dynamic lifting. 

Other languages, include Google’s Cirq [17] (a set of python libraries), IBM’s 
Qiskit [2] (a set of python libraries) and Microsoft’s Q# [35] (standalone). These 
languages offer a wide-range of quantum functions and features, however, none 
of them are type-safe. Qimaera does not have this problem and this is indeed its 
main advantage over them, together with dependent types. 


8 Future Work 


For future work, it would be interesting to consider methods that would allow 
us to reduce some of the proof obligations that are imposed by the Unitary 
data type. Going beyond Idris and our library, another natural direction is to 
consider whether programming languages that support substructural approaches 
other than linearity (e.g., uniqueness types, ownership) can be used to achieve 
type-safe quantum programming. It would also be interesting to consider the 
relevance of arrows [21,5] in quantum programming. Furthermore, implement- 
ing and testing our abstract interface on an actual hybrid quantum-classical 
hardware environment would most likely bring additional challenges. 
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Abstract. Probabilistic Programming Languages (PPLs) allow users to 
encode statistical inference problems and automatically apply an infer- 
ence algorithm to solve them. Popular inference algorithms for PPLs, 
such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo 
(MCMC), are built around checkpoints—relevant events for the inference 
algorithm during the execution of a probabilistic program. Deciding the 
location of checkpoints is, in current PPLs, not done optimally. To solve 
this problem, we present a static analysis technique that automatically 
determines checkpoints in programs, relieving PPL users of this task. The 
analysis identifies a set of checkpoints that execute in the same order in 
every program run—they are aligned. We formalize alignment, prove the 
correctness of the analysis, and implement the analysis as part of the 
higher-order functional PPL Miking CorePPL. By utilizing the align- 
ment analysis, we design two novel inference algorithm variants: aligned 
SMC and aligned lightweight MCMC. We show, through real-world ex- 
periments, that they significantly improve inference execution time and 
accuracy compared to standard PPL versions of SMC and MCMC. 


Keywords: Probabilistic programming - Operational semantics - Static 
analysis. 


1 Introduction 


Probabilistic programming languages (PPLs) are languages used to encode sta- 
tistical inference problems, common in research fields such as phylogenetics [39], 
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computer vision [16], topic modeling [5], data cleaning [23], and cognitive sci- 
ence [15]. PPL implementations automatically solve encoded problems by ap- 
plying an inference algorithm. In particular, automatic inference allows users 
to solve inference problems without having in-depth knowledge of inference al- 
gorithms and how to apply them. Some examples of PPLs are WebPPL [14], 
Birch [31], Anglican [48], Miking CorePPL [25], Turing [12], and Pyro [3]. 

Sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) are 
general-purpose families of inference algorithms often used for PPL implemen- 
tations. These algorithms share the concept of checkpoints: relevant execution 
events for the inference algorithm. For SMC, the checkpoints are likelihood up- 
dates [48,14] and determine the resampling of executions. Alternatively, users 
must sometimes manually annotate or write the probabilistic program in a cer- 
tain way to make resampling explicit [25,31]. For MCMC, checkpoints are instead 
random draws, which allow the inference algorithm to manipulate these draws to 
construct a Markov chain over program executions [47,38]. When designing SMC 
and MCMC algorithms for universal PPLs*, both the placement and handling 
of checkpoints are critical to making the inference both efficient and accurate. 

For SMC, a standard inference approach is to resample at all likelihood 
updates [14,48]. This approach produces correct results asymptotically [24] but 
is highly problematic for certain models [39]. Such models require non-trivial 
and SMC-specific manual program rewrites to force good resampling locations 
and make SMC tractable. Overall, choosing the likelihood updates at which to 
resample significantly affects SMC execution time and accuracy. 

For MCMC, astandard approach for inference in universal PPLs is lightweight 
MCMC [47], which constructs a Markov chain over random draws in programs. 
The key idea is to use an addressing transformation and a runtime database of 
random draws. Specifically, the database enables matching and reusing random 
draws between executions according to their stack traces, even if the random 
draws may or may not occur due to randomness during execution. However, the 
dynamic approach of looking up random draws in the database through their 
stack traces is expensive and introduces significant runtime overhead. 

To overcome the SMC and MCMC problems in universal PPLs, we present 
a static analysis technique for higher-order functional PPLs that automatically 
determines checkpoints in a probabilistic program that always occur in the same 
order in every program execution—they are aligned. We formally define align- 
ment, formalize the alignment analysis, and prove the soundness of the analysis 
with respect to the alignment definition. The novelty and challenge in developing 
the static analysis technique is to capture alignment properties through the iden- 
tification of expressions in programs that may evaluate to stochastic values and 
expressions that may evaluate due to stochastic branching. Stochastic branching 
results from if expressions with stochastic values as conditions or function ap- 
plications where the function itself is stochastic. Stochastic values and branches 
pose a significant challenge when proving the soundness of the analysis. 


4 A term coined by Goodman et al. [13]. Essentially, it means that the types and 
numbers of random variables cannot be determined statically. 
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We design two new inference algorithms that improve accuracy and execu- 
tion time compared to current approaches. Unlike the standard SMC algorithm 
for PPLs [48,14], aligned SMC only resamples at aligned likelihood updates. Re- 
sampling only at aligned likelihood updates guarantees that each SMC execution 
resamples the same number of times, which makes expensive global termination 
checks redundant [25]. We evaluate aligned SMC on two diversification models 
from Ronquist et al. [39] and a state-space model for aircraft localization, demon- 
strating significantly improved inference accuracy and execution time compared 
to traditional SMC. Both models—constant rate birth-death (CRBD) and clado- 
genetic diversification rate shift (ClaDS)—are used in real-world settings and are 
of considerable interest to evolutionary biologists [33,28]. The documentations 
of both Anglican [48] and Turing [12] acknowledge the importance of alignment 
for SMC and state that all likelihood updates must be aligned. However, Turing 
and Anglican neither formalize nor enforce this property—it is up to the users 
to manually guarantee it, often requiring non-standard program rewrites [39]. 

We also design aligned lightweight MCMC, a new version of lightweight 
MCMC [47]. Aligned lightweight MCMC constructs a Markov chain over the 
program using the aligned random draws as synchronization points to match 
and reuse aligned random draws and a subset of unaligned draws between execu- 
tions. Aligned lightweight MCMC does not require a runtime database of random 
draws and therefore reduces runtime overhead. We evaluate aligned lightweight 
MCMC for latent Dirichlet allocation (LDA) [5] and CRBD [39], demonstrat- 
ing significantly reduced execution times and no decrease in inference accuracy. 
Furthermore, automatic alignment is orthogonal to and easily combines with the 
lightweight MCMC optimizations introduced by Ritchie et al. [38]. 

We implement the analysis, aligned SMC, and aligned lightweight MCMC 
in Miking CorePPL [25,7]. In addition to analyzing stochastic if-branching, the 
implementation analyzes stochastic branching at a standard pattern-matching 
construct. Compared to if expressions, the pattern-matching construct requires 
a more sophisticated analysis of the pattern and the value matched against it to 
determine if the pattern-matching causes a stochastic branch. 

In summary, we make the following contributions. 


— We invent and formalize alignment for PPLs. Aligned parts of a program 
occur in the same order in every execution (Section 4.1). 

— We formalize and prove the soundness of a novel static analysis technique 
that determines stochastic value flow and stochastic branching, and in turn 
alignment, in higher-order probabilistic programs (Section 4.2). 

— We design aligned SMC inference that only resamples at aligned likelihood 
updates, improving execution time and inference accuracy (Section 5.1). 

— We design aligned lightweight MCMC inference that only reuses aligned 
random draws, improving execution time (Section 5.2). 

— We implement the analysis and inference algorithms in Miking CorePPL. 
The implementation extends the alignment analysis to identify stochastic 
branching resulting from pattern matching (Section 6). 
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Section 7 describes the evaluation and discusses its results. The paper also has 
an accompanying artifact that supports the evaluation [26]. Section 8 discusses 
related work and Section 9 concludes. Next, Section 2 considers a simple mo- 
tivating example to illustrate the key ideas. Section 3 introduces syntax and 
semantics for the calculus used to formalize the alignment analysis. 

An extended version of the paper is also available at arXiv [27]. We use the 
symbol Ý in the text to indicate that more information (e.g., proofs) is available 
in the extended version. 


2 A Motivating Example 


This section presents a motivating example that illustrates the key alignment 
ideas in relation to aligned SMC (Section 2.1) and aligned lightweight MCMC 
(Section 2.2). We assume basic knowledge of probability theory. Knowledge of 
PPLs is helpful, but not a strict requirement. The book by van de Meent et 
al. [46] provides a good introduction to PPLs. 

Probabilistic programs encode Bayesian statistical inference problems with 
two fundamental constructs: assume and weight. The assume construct defines 
random variables, which make execution nondeterministic. Intuitively, a proba- 
bilistic program then encodes a probability distribution over program executions 
(the prior distribution), and it is possible to sample from this distribution by 
executing the program with random sampling at assumes. The weight construct 
updates the likelihood of individual executions. Updating likelihoods for execu- 
tions modifies the probability distribution induced by assumes, and the inference 
problem encoded by the program is to determine or approximate this modified 
distribution (the posterior distribution). The main purpose of weight in real- 
world models is to condition executions on observed data.° 

Consider the probabilistic program in Fig. la. The program is contrived 
and purposefully constructed to compactly illustrate alignment, but the real- 
world diversification models in Ronquist et al. [39] that we also consider in 
Section 7 inspired the program’s general structure. The program defines (line 1) 
and returns (line 18) a Gamma-distributed random variable rate. Figure 1b 
illustrates the Gamma distribution. To modify the likelihood for values of rate, 
the program executes the iter function (line 10) three times, and the survives 
function (line 2) a random number of times n (line 13) within each iter call. 

Conceptually, to infer the posterior distribution of the program, we execute 
the program infinitely many times. In each execution, we draw samples for the 
random variables defined at assume, and accumulate the likelihood at weight. 
The return value of the execution, weighted by the accumulated likelihood, rep- 
resents one sample from the posterior distribution. Fig. 1c shows a histogram 
of such weighted samples of rate resulting from a large number of executions 
of Fig. la. The fundamental inference algorithm that produces such weighted 
samples is called likelihood weighting (a type of importance sampling [32]). We 


5 A number of more specialized constructs for likelihood updating are also available 
in various PPLs, for example observe [48,14] and condition [14]. 
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1 let rate = assume Gamma(2,2) in 
let rec survives = An. 
if n = 0 then () else 


2 
3 
4 if assume Bernoulli(0.9) then 0 5 10 5 0 5 10 15 
5 weight 0.5; 
6 survives (n — 1) (b) Gamma(2, 2). (c) Histogram. 
a else 
8 weight 0 w1{12412] 5 515 12|5|15 
9 in w2|12\|_5 L5 12] 8 |12| 5 
10 let rec iter = Xi. wı ms [5 [5 Bs 5 
11 if i =O then () else we BB 5 [5 8 DE 
12 weight rate; 
13 let n = assume Poisson(rate) in d) Aligning weight. 
14 survives N; 
15 iter (4 — 1) sı |1 |13| 4 |13| 4 | 4 | 4 413 
16 in s2 |1 ]13] 4 | 4 13] 4 1413] 4 
17 iter 3; sı M HE 4 4 
18 rate s2 S| 4 4 | 4 4 BE 
(a) Probabilistic program. (e) Aligning assume. 


Fig.1: A simple example illustrating alignment. Fig. (a) gives a probabilis- 
tic program using functional-style PPL pseudocode. Fig. (b) illustrates the 
Gamma(2, 2) probability density function. Fig. (c) illustrates a histogram over 
weighted rate samples produced by running the program in (a) a large num- 
ber of times. Fig. (d) shows two line number sequences w and wz of weights 
encountered in two program runs (top) and how to align them (bottom). Fig. 
(e) shows two line number sequences sı and sz of assumes encountered in two 
program runs (top) and how to align them (bottom). 


see that, compared to the prior distribution for rate in Fig. 1b, the posterior is 
more sharply peaked due to the likelihood modifications. 


2.1 Aligned SMC 


Likelihood weighting can only handle the simplest of programs. In Fig. la, a 
problem with likelihood weighting is that we assign the weight 0 to many exe- 
cutions at line 8. These executions contribute nothing to the final distribution. 
SMC solves this by executing many program instances concurrently and occa- 
sionally resampling them (with replacement) based on their current likelihoods. 
Resampling discards executions with lower weights (in the worst case, 0) and re- 
places them with executions with higher weights. The most common approach in 
popular PPLs is to resample just after likelihood updates (i.e., calls to weight). 

Resampling at all calls to weight in Fig. la is suboptimal. The best option is 
instead to only resample at line 12. This is because executions encounter lines 5 
and 8 a random number of times due to the stochastic branch at line 3, while they 
encounter line 12 a fixed number of times. As a result of resampling at lines 5 and 
8, executions become unaligned; in each resampling, executions can have reached 
either line 5, line 8, or line 12. On the other hand, if we resample only at line 12, 
all executions will always have reached line 12 for the same iteration of iter in 
every resampling. Intuitively, this is a sensible approach since, when resampling, 
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executions have progressed the same distance through the program. We say that 
the weight at line 12 is aligned, and resampling only at aligned weights results 
in our new inference approach called aligned SMC. Fig. 1d visualizes the weight 
alignment for two sample executions of Fig. la. 


2.2 Aligned Lightweight MCMC 


Another improvement over likelihood weighting is to construct a Markov chain 
over program executions. It is beneficial to propose new executions in the Markov 
chain by making small, rather than large, modifications to the previous execu- 
tion. The lightweight MCMC [47] algorithm does this by redrawing a single 
random draw in the previous execution, and then reusing as many other ran- 
dom draws as possible. Random draws in the current and previous executions 
match through stack traces—the sequences of applications leading up to a ran- 
dom draw. Consider the random draw at line 13 in Fig. la. It is called exactly 
three times in every execution. If we identify applications and assumes by line 
numbers, we get the stack traces [17, 13], [17, 15, 13], and [17, 15, 15, 13] for these 
three assumes in every execution. Consequently, lightweight MCMC can reuse 
these draws by storing them in a database indexed by stack traces. 

The stack trace indexing in lightweight MCMC is overly complicated when 
reusing aligned random draws. Note that the assumes at lines 1 and 13 in Fig la 
are aligned, while the assume at line 4 is unaligned. Fig. le visualizes the assume 
alignment for two sample executions of Fig. 1a. Aligned random draws occur in 
the same same order in every execution, and are therefore trivial to match and 
reuse between executions through indexing by counting. The appeal with stack 
trace indexing is to additionally allow reusing a subset of unaligned draws. 

A key insight in this paper is that aligned random draws can also act as 
synchronization points in the program to allow reusing unaligned draws without a 
stack trace database. After an aligned draw, we reuse unaligned draws occurring 
up until the next aligned draw, as long as they syntactically originate at the 
same assume as the corresponding unaligned draws in the previous execution. 
As soon as an unaligned draw does not originate from the same assume as in 
the previous execution, we redraw all remaining unaligned draws up until the 
next aligned draw. Instead of a trace-indexed database, this approach requires 
storing a list of unaligned draws (tagged with identifiers of the assumes at which 
they originated) for each execution segment in between aligned random draws. 
For example, for the execution sı in Fig. le, we store lists of unaligned Bernoulli 
random draws from line 4 for each execution segment in between the three aligned 
random draws at line 13. If a Poisson random draw n at line 13 does not change 
or decreases, we can reuse the stored unaligned Bernoulli draws up until the 
next Poisson random draw as survives executes n or fewer times. If the drawn n 
instead increases to n’, we can again reuse all stored Bernoulli draws, but must 
supplement them with new Bernoulli draws to reach n’ draws in total. 

As we show in Section 7, using aligned draws as synchronization points works 
very well in practice and avoids the runtime overhead of the lightweight MCMC 
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database. However, manually identifying aligned parts of programs and rewrit- 
ing them so that inference can make use of alignment is, if even possible, te- 
dious, error-prone, and impractical for large programs. This paper presents an 
automated approach to identifying aligned parts of programs. Combining static 
alignment analysis and using aligned random draws as synchronization points 
form the key ideas of the new algorithm that we call aligned lightweight MCMC. 


3 Syntax and Semantics 


In preparation for the alignment analysis in Section 4, we require an idealized 
base calculus capturing the key features of expressive PPLs. This section intro- 
duces such a calculus with a formal syntax (Section 3.1) and semantics (Sec- 
tion 3.2). We assume a basic understanding of the lambda calculus (see, e.g., 
Pierce [37] for a complete introduction). Section 6 further describes extending 
the idealized calculus and the analysis in Section 4 to a full-featured PPL. 


3.1 Syntax 


We use the untyped lambda calculus as the base for our calculus. We also add 
let expressions for convenience, and if expressions to allow intrinsic booleans 
to affect control flow. The calculus is a subset of the language used in Fig. la. 
We inductively define terms t and values v as follows. 


Definition 1 (Terms and values). 
t:= x |c |àr.t]|tt]|letz=t int v:= c | (Aa. t,p) 
| if t then t else t | assume t | weight t (1) 
ryeEX pEP ceC {false,true,()}URUDC C. 


X is a countable set of variable names, C a set of intrinsic values and operations, 
and D C C a set of probability distributions. The set P contains all evaluation 
environments p, that is, partial functions mapping names in X to values v. We 
use T and V to denote the set of all terms and values, respectively. 


Values v are intrinsics or closures, where closures are abstractions with an en- 
vironment binding free variables in the abstraction body. We require that C 
include booleans, the unit value (), and real numbers. The reason is that weight 
takes real numbers as argument and returns () and that if expression conditions 
are booleans. Furthermore, probability distributions are often over booleans and 
real numbers. For example, we can include the normal distribution constructor 
N € C that takes real numbers as arguments and produces normal distributions 
over real numbers. For example, V 0 1 € D, the standard normal distribution. 
We often write functions in C in infix position or with standard function appli- 
cation syntax for readability. For example, 1 + 2 with + € C means + 1 2, and 
N (0,1) means M 0 1. Additionally, we use the shorthand tı;t2 for let _ = ty 
in t2, where _ is the do-not-care symbol. That is, t;;t2 evaluates tı for side 
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1 let rec geometric = X_. Standard geometric 

2 let x = assume Bernoulli(0.5) in 

3 if x then m 

4 weight 1.55 Weighted geometric 

5 1+ geometric () 

6 else 1 

7 in geometric () 12345678 9 
(a) Probabilistic program tyeo. (b) Probability distributions. 


Fig. 2: A probabilistic program tye. [25], illustrating (1). Fig. (a) gives the pro- 
gram, and (b) the corresponding probability distributions. In (b), the y-axis gives 
the probability, and the x-axis gives the outcome (the number of coin flips). The 
upper part of (b) excludes the shaded weight at line 4 in (a). 


effects only before evaluating tz. Finally, the untyped lambda calculus supports 
recursion through fixed-point combinators. We encapsulate this in the shorthand 
let rec f = Ax.tı in tz to conveniently define recursive functions. 

The assume and weight constructs are PPL-specific. We define random vari- 
ables from intrinsic probability distributions with assume (also known as sam- 
ple in PPLs with sampling-based inference). For example, the term let x = 
assume JV(0,1) in t defines x as a random variable with a standard normal 
distribution in t. Boolean random variables combined with if expressions result 
in stochastic branching—causing the alignment problem. Lastly, weight (also 
known as factor or score) is a standard construct for likelihood updating (see, 
e.g., Borgstrém et al. [6]). Next, we illustrate and formalize a semantics for (1). 


3.2 Semantics 


Consider the small probabilistic program tye. E€ T in Fig. 2a. The program 
encodes the standard geometric distribution via a function geometric, which 
recursively flips a fair coin (a Bernoulli(0.5) distribution) at line 2 until the 
outcome is false (i.e., tails). At that point, the program returns the total number 
of coin flips, including the last tails flip. The upper part of Fig. 2b illustrates the 
result distribution for an infinite number of program runs with line 4 ignored. 

To illustrate the effect of weight, consider tye. with line 4 included. This 
weight modifies the likelihood with a factor 1.5 each time the flip outcome is 
true (or, heads). Intuitively, this emphasizes larger return values, illustrated in 
the lower part of Fig. 2b. Specifically, the (unnormalized) probability of seeing 
n coin flips is 0.5" - 1.5”71, compared to 0.5” for the unweighted version. The 
factor 1.5"~1 is the result of the calls to weight. 

We now introduce a big-step operational semantics for single runs of programs 
t. Such a semantics is essential to formalize the probability distributions encoded 
by probabilistic programs (e.g., Fig. 2b for Fig. 2a) and to prove the correctness 
of PPL inference algorithms. For example, Borgstrém et al. [6] define a PPL 
calculus and semantics similar to this paper and formally proves the correctness 
of an MCMC algorithm. Another example is Lundén et al. [24], who also define a 
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(VAR) (Const) (Lam) 


pH a "Yh plz) pec" e pk At "Yh (Ax-t, p) 
pk ty “tym (Az.t, p) pl te 2 NP? v2 p,tovet “3.708 v 
pH ti te sillsallss wi -wa-ws v 


l |[l2 [lls 
prt: YW oi pr te We prt Pd w = falc) 


(APP) 


aia (Const-APP) ia — (ASSUME) 
pk ty t2 ying” (cr, €2) pr assume t yr” c 
Ft, yet pCR V1 F te “72 v Kt YY w 
: u P 81 lls2 wi w2 2 (Ler) £ = — 7 (WEIGHT) 
pr let c=t; in te Vaieli v pl weight t “\; ©) 
H ty 1t true F to “217°? vo 
pk ty Vy d Me (Ir-TRUE) 


pl if tı then to else ts Sel ag V2 


prt Yi false pl ts I vs 


pt if tı then to else t3 eases V3 


(IF- FALSE) 


Fig. 3: A big-step operational semantics for terms, formalizing single runs of pro- 
grams t € T. The operation p, x +> v produces a new environment extending p 
with a binding v for x. For each distribution d € D, fa is its probability density 
or probability mass function—encoding the relative probability of drawing par- 
ticular values from the distribution. For example, fgernoutti(o.3) (true) = 0.3 and 
fBernoulli(0.3) (false) = 1 — 0.3 = 0.7. We use - to denote multiplication. 


similar calculus and semantics and prove the correctness of PPL SMC algorithms. 
In particular, the correctness of our aligned SMC algorithm (Section 5.1) follows 
from this proof. The purpose of the semantics in this paper is to formalize 
alignment and prove the soundness of our analysis in Section 4. We use a big- 
step semantics as the finer granularity in a small-step semantics is redundant. 
We begin with a definition for intrinsics. 


Definition 2 (Intrinsic functions). For every c € C, we attach an arity 
\c| € N. We define a partial function 6: C x C + C such that 6(c,c1) = c2 is 
defined for |c| > 0. For all c, c1, and c2, such that 6(c,c1) = c2, |\c2| = |e| — 1. 


Intrinsic functions are curried and produce intrinsic or intrinsic functions of one 
arity less through 6. For example, for + € C, we have 6(6(+, 1), 2) = 3, |+| = 2, 
|0(+,1)| = 1, and |6(d(+,1),2)| = 0. Next, randomness in our semantics is 
deterministic via a trace of random draws in the style of Kozen [22]. 


Definition 3 (Traces). The set S of traces is the set such that, for all s € S, 
s is a sequence of intrinsics from C with arity 0. 


In the following, we use the notation [ci,c2,...,Cn] for sequences and || for 
sequence concatenation. For example, [c1, c2] || [c2, ca] = [c1, c2,c3, c4]. We also 
use subscripts to select elements in a sequence, e.g., [c1,C2,C3,Cal2 = c2. In 


practice, traces are often sequences of real numbers, e.g., [1.1,3.2,8.4] € S. 
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Fig. 3 presents the semantics as a relation p F t *\}/? v over Px Tx SxRx 
L x V. L is the set of sequences over X, i.e., sequences of names. For example, 
[x,y,z] € L, where x,y,z E€ X. We use l € L to track the sequence of let- 
bindings during evaluation. For example, evaluating let x = 1 in let y = 2 
in x+y results in l = [x,y]. In Section 4, we use the sequence of encountered 
let-bindings to define alignment. For simplicity, from now on we assume that 
bound variables are always unique (i.e., variable shadowing is impossible). 

It is helpful to think of p, t, and s as the input to J), and l, w and v as the out- 
put. In the environment p, t, with trace s, evaluates to v, encounters the sequence 
of let bindings l, and accumulates the weight w. The trace s is the sequence of 
all random draws, and each random draw in (ASSUME) consumes precisely one 
element of s. The rule (LET) tracks the sequence of bindings by adding x at the 
correct position in l. The number w is the likelihood of the execution—the prob- 
ability density of all draws in the program, registered at (ASSUME), combined 
with direct likelihood modifications, registered at (WEIGHT). The remaining as- 


pects of the semantics are standard (see, e.g., Kahn [20]). To give an example of 
the semantics, we have Ø F tyeo eee 4 for the 
particular execution of tye. making three recursive calls. Next, we formalize and 


apply the alignment analysis to (1). 


4 Alignment Analysis 


This section presents the main contribution of this paper: automatic alignment 
in PPLs. Section 4.1 introduces A-normal form and gives a precise definition of 
alignment. Section 4.2 formalizes and proves the correctness of the alignment 
analysis. Lastly, Section 4.3 discusses a dynamic version of alignment. 


4.1 A-Normal Form and Alignment 


To reason about all subterms t’ of a program t and to enable the analysis in 
Section 4.2, we need to uniquely label all subterms. A straightforward approach 
is to use variable names within the program itself as labels (remember that 
we assume bound variables are always unique). This leads us to the standard 
A-normal form (ANF) representation of programs [11]. 


Definition 4 (A-normal form). 
tanp = T | let x = thyf in tANF 


tanp “= £ | c | Ag. tayf | £ y (2) 
| if x then tanr else tanp | assume x | weight x 


We use Tanr to denote the set of all terms tanp. Unlike t € T, tanp € TANF 
enforces that a variable bound by a let labels each subterm in the program. 
Furthermore, we can automatically transform any program in T to a semantically 
equivalent Tanf program, and Tanp C T. Therefore, we assume in the remainder 
of the paper that all terms are in ANF. 
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Given the importance of alignment in universal PPLs, it is somewhat surpris- 
ing that there are no previous attempts to give a formal definition of its meaning. 
Here, we give a first such formal definition, but before defining alignment, we 
require a way to restrict, or filter, sequences. 


Definition 5 (Restriction of sequences). For alll € L and Y C X, lly (the 
restriction of l to Y ) is the subsequence of | with all elements not in Y removed. 


For example, [2, y, z,Y,7]|{,2} = [£, 2,7]. We now formally define alignment. 


Definition 6 (Alignment). For t € Tanr, let X; denote all variables that 
occur in t. The sets Ap E€ At, At C Xt, are the largest sets such that, for 
arbitrary Ø F t ty vı and ØF t ni vo, lila, = lala,- 


For a given At, the aligned expressions—expressions bound by a let to a variable 
name in Ay—are those that occur in the same order in every execution, regardless 
of random draws. We seek the largest sets, as Ay = Ø is always a trivial solution. 
Assume we have a program with X; = {x,y,z} and such that l = |v, y, x, z, 2] 
and | = [a,y,x,2,x,y] are the only possible sequences of let bindings. Then, 
A, = {x, z} is the only possibility. It is also possible to have multiple choices for 
At. For example, if | = [x,y,z] and l = |z, z,y] are the only possibilities, then 
At = {{x,z}, {x, y}}. Next, assume that we transform the programs in Fig. 2a 
and Fig. la to ANF. The expression labeled by x in Fig. 2a is then clearly not 
aligned, as random draws determine how many times it executes (l could be, e.g., 
|x, x] or [x,x,x,x]). Conversely, the expression n (line 13) in Fig. la is aligned, 
as its number and order of evaluations do not depend on any random draws. 

Definition 6 is context insensitive: for a given At, each x is either aligned 
or unaligned. One could also consider a context-sensitive definition of alignment 
in which x can be aligned in some contexts and unaligned in others. A context 
could, for example, be the sequence of function applications (i.e., the call stack) 
leading up to an expression. Considering different contexts for x is complicated 
and difficult to take full advantage of. We justify the choice of context-insensitive 
alignment with the real-world models in Section 7, neither of which requires a 
context-sensitive alignment. 

With alignment defined, we now move on to the static alignment analysis. 


4.2 Alignment Analysis 


The basis for the alignment analysis is 0-CFA [34,42]—a static analysis frame- 
work for higher-order functional programs. The prefix 0 indicates that 0-CFA is 
context insensitive. There is also a set of analyses k-CFA [30] that adds increas- 
ing amounts (with k € N) of context sensitivity to 0-CFA. We could use such 
analyses with a context-sensitive version of Definition 6. However, the potential 
benefit of k-CFA is also offset by the worst-case exponential time complexity, 
already at k = 1. In contrast, the time complexity of 0-CFA is polynomial (cu- 
bic in the worst-case). The alignment analysis for the models in Section 7 runs 
instantaneously, justifying that the time complexity is not a problem in practice. 
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1 let ny = ~ in let ng =a in 12 let v2 = ny, a, in 

2 let one = 1 in 13 let v3 = n2 c in 

3 let half = 0.5 in let c = true in 14 let fs = 

4 let fı = Ax. let tı = weight one in zı in 15 if a, then let ts = f4 one in fo 
5 let fo = Avg. let t2 = weight one in t2 in 16 else fs 

6 let fs = Ax. let t3 = weight one in tg in 17 in 

7 let fa = Axv4. let t4 = weight one in t4 in 18 let v4 = f5 one in 

8 let bern = Bernoulli in 19 let 741 = 

9 let dı = bern half in 20 if c then let tg = fi one in te 
10 let a; = assume dı 21 else one 

11 let vı = fı one in 22 in ĉi 


Fig. 4: A program terample E€ Tanr illustrating the analysis. 


The extensions to 0-CFA required to analyze alignment are non-trivial to 
design, but the resulting formalization is surprisingly simple. The challenge is 
instead to prove that the extensions correctly capture the alignment property 
from Definition 6. We extend 0-CFA to analyze stochastic values and alignment 
in programs t € Tanp. As with most static analyses, our analysis is sound but 
conservative (i.e., sound but incomplete)—the analysis may mark aligned expres- 
sions of programs as unaligned, but not vice versa. That the analysis is conserva- 
tive does not degrade the alignment analysis results for any model in Section 7, 
which justifies the approach. We divide the formal analysis into two algorithms. 
Algorithm 1 generates constraints for t that a valid analysis solution must satisfy. 
This section describes Algorithm 1 and the generated constraints. The second al- 
gorithm computes a solution that satisfies the generated constraints. We describe 
the algorithm at a high level, but omit a full formalization. t 

For soundness of the analysis, we require (Ax. t,p) ¢ C (recall that C is 
the set of intrinsics). That is, closures are not in C. By Definition 3, this im- 
plies that closures are not in the sample space of probability distributions in D 
and that evaluating intrinsics never produces closures (this would unnecessarily 
complicate the analysis without any benefit). 

In addition to standard 0-CFA constraints, Algorithm 1 generates new con- 
straints for stochastic values and unalignment. We use the contrived but illus- 
trative program in Fig. 4 as an example. Note that, while omitted from Fig. 4 
for ease of presentation, the analysis also supports recursion introduced through 
let rec. Stochastic values are values in the program affected by random vari- 
ables. Stochastic values initially originate at assume and then propagate through 
programs via function applications and if expressions. For example, a; (line 10) 
is stochastic because of assume. We subsequently use a, to define vo via nı 
(line 12), which is then also stochastic. Similarly, a; is the condition for the if 
resulting in fs (line 14), and the function fs is therefore also stochastic. When 
we apply fs, it results in yet another stochastic value, v4 (line 18). In conclusion, 
the stochastic values are a1, v2, fs, and v4. 

Consider the flow of unalignment in Fig. 4. We mark expressions that may 
execute due to stochastic branching as unaligned. From our analysis of stochastic 
values, the program’s only stochastic if condition is at line 15, and we determine 
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that all expressions directly within the branches are unaligned. That is, the 
expression labeled by t5 is unaligned. Furthermore, we apply the variable f4 
when defining t5. Thus, all expressions in bodies of lambdas that flow to f4 are 
unaligned. Here, it implies that t4 is unaligned. Finally, we established that the 
function f5 produced at line 15 is stochastic. Due to the application at line 18, all 
names bound by lets in bodies of lambdas that flow to fs are unaligned. Here, 
it implies that tə and ts are unaligned. In conclusion, the unaligned expressions 
are named by t2, t3, t4, and ts. For example, aligned SMC therefore resamples 
at the weight at tı, but not at the weights at t2, t3, and t4. 


Consider the program in Fig. la again, and assume it is transformed to ANF. 
The alignment analysis must mark all names bound within the stochastic if at 
line 3 as unaligned because a stochastic value flows to its condition. In particular, 
the weight expressions at lines 5 and 8 are unaligned (and the weight at line 12 
is aligned). Thus, aligned SMC resamples only at line 12. 


To formalize the flow of stochastic values, we define abstract values a ::= 
Az.y | stoch | const n, where x,y € X and n € N. We use A to denote the set 
of all abstract values. The stoch abstract value is new and represents stochastic 
values. The Ax.y and const n abstract values are standard and represent abstract 
closures and intrinsics, respectively. For each variable name z in the program, we 
define a set Sy containing abstract values that may occur at x. For example, in 
Fig. 4, we have stoch € Sa,, (Av2.t2) E Sfo, and (const 1) € Sn,. The abstract 
value Ax2.t2 represents all closures originating at Axv2, and const 1 represents 
intrinsic functions in C of arity 1 (in our example, ~). The body of the abstract 
lambda is the variable name labeling the body, not the body itself. For example, 
to labels the body let t2 = one in to of Arg. Due to ANF, all terms have a 
label, which the function NAME in Algorithm 1 formalizes. 


We also define booleans unaligned, that state whether or not the expression 
labeled by x is unaligned. For example, we previously reasoned that unaligned, = 
true for x E€ {to,ts,t4,ts} in Fig. 4. The alignment analysis aims to deter- 
mine minimal sets Sẹ and boolean assignments of unaligned, for every pro- 
gram variable x € X. A trivial solution is that all abstract values (there is a 
finite number of them in the program) flow to each program variable and that 
unaligned, = true for all x € X. This solution is sound but useless. To compute 
a more precise solution, we follow the rules given by constraints c € R. 


We present the constraints through the GENERATECONSTRAINTS function in 
Algorithm 1 and for the example in Fig. 4. There are no constraints for variables 
that occur at the end of ANF let sequences (line 2 in Algorithm 1), and the 
case for let expressions (lines 3-36) instead produces all constraints. The cases 
for aliases (line 6), intrinsics (line 7), assume (line 35), and weight (line 36) are 
the most simple. Aliases of the form let x = y in t establish Sy C S,. That 
is, all abstract values at y are also in x. Intrinsic operations results in a const 
abstract value. For example, the definition of nı at line 1 in Fig. 4 results in the 
constraint const 1 € S,,. Applications of assume are the source of stochastic 
values. For example, the definition of a, at line 10 results in the constraint stoch 
€ Sa,- Note that assume cannot produce any other abstract values, as we only 
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Algorithm 1 Constraint generation function for t € Tanp. We denote the power 
set of a set E with P(E). 
function GENERATECONSTRAINTS(t): Tanr > P(R) = 


1 match t with 26 | if y then ty else te > 

2 |x > Ø 27 GENERATECONSTRAINTS(t+) 

3 | let z = tı inte > 28 U GENERATECONSTRAINTS(te ) 

4 GENERATECONSTRAINTS(t2) U 29 U {Syame(tz) © Se, Swame(te) E Sex, 
5 match tı with 30 stoch € Sy = stoch € Sz} 

6 | y > {Sy C Sz} 31 U {unaligned, > unaligned,, 

7 | c — if |c| > 0 then {const Jel € Sx} 32 [ne NAMES(t; ) U NAMES(t, ) } 
8 else g 33 U {stoch € Sy > unaligned, 

9 | Ay. ty —> GENERATECONSTRAINTS(ty ) 34 | n € NAMES(t;) U NAMES(te)} 
10 U {Ay. NAME(ty) € Sz} 35 | assume _ —> {stoch € Sz} 

11 U {unaligned , => unaligned,, 36 | weight > Ø 

12 | n E NAMES(t,)} 37 

13 | lhs rhs > { 38 function NAME(t): Tanr > X = 

14 VzVy Az-y © Sins 39 match t with 

15 => (Srs C Sz) A (Sy E Sz), 40 |> 

16 Vn (const n € Sins) A (n > 1) 41 | let x = tı in tg —> NAME(t2) 

17 => const n — 1 € Sz, 42 

18 stoch € Shs => stoch € S,, 43 function NAMEs(t): Tanr > P(X) = 

19 const € Sins 44 match t with 

20 => (stoch € Smas > stoch € Sz), 46 |> 

21 unaligned „ 46 |let z= _ int > {r} U NAMEs(t2) 
22 => (Vy Ay._ € Sins > unaligned,,), 47 

23 stoch € Sips 48 

24 => (Vy Ay-_ E€ Sins > unaligned, ) 49 

ae } 50 


allow distributions over intrinsics with arity 0 (see Definition 3). Finally, we use 
weight only for its side effect (likelihood updating), and therefore weights do 
not produce any abstract values and consequently no constraints. 

The cases for abstractions (line 9), applications (line 13), and ifs (line 26) 
are more complex. The abstraction at line 4 in Fig. 4 generates (omitting the 
recursively generated constraints for the abstraction body t,) the constraints 
{Ar1.01 E Sp, } U {unaligned,, = unaligned, }. The first constraint is standard: 
the abstract lambda A2x1.x, flows to Sp. The second constraint states that if the 
abstraction is unaligned, all expressions in its body (here, only tı) are unaligned. 
We define the sets of expressions within abstraction bodies and if branches 
through the NAMES function in Algorithm 1 (line 43). 

The application fs one at line 18 in Fig. 4 generates the constraints 


{VzVy Az.y € Sts = (Sone C Sz) A (Sy C Soa) 
Yn (const n E Sf) A(n > 1) > const n—1€S,,, 
stoch € Sy, > stoch € Sy,, 
const _ E€ Sf, => (stoch € Sone > stoch € S,,), 
unaligned, = (Vy Ay._ E€ Sf, => unaligned,,), 
stoch € Sy, => (Vy Ay._ E Sins => unaligned, )} 
The first constraint is standard: if an abstract value Az.y flows to fs, the abstract 


values of one (the right-hand side) flow to z. Furthermore, the result of the appli- 
cation, given by the body name y, must flow to the result v4 of the application. 
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The second constraint is also relatively standard: if an intrinsic function of arity 
n is applied, it produces a const of arity n — 1. The other constraints are new 
and specific for stochastic values and unalignment. The third constraint states 
that if the function is stochastic, the result is stochastic. The fourth constraint 
states that if we apply an intrinsic function to a stochastic argument, the result is 
stochastic. We could also make the analysis of intrinsic applications less conser- 
vative through intrinsic-specific constraints. The fifth and sixth constraints state 
that if the expression (labeled by v4) is unaligned or the function is stochastic, 
all abstract lambdas that flow to the function are unaligned. 

The if resulting in fs at line 14 in Fig. 4 generates (omitting the recursively 
generated constraints for the branches t; and te) the constraints 


{Sname(fo) © Sfo» Swame(fs) E Sys, 8toch € Sa, => stoch E Sps} 
U {unaligned », = unaligned,,} U{stoch E€ Sa, = unaligned, } 


(4) 
The first two constraints are standard and state that the result of the branches 
flows to the result of the if expression. The remaining constraints are new. The 
third constraint states that if the condition is stochastic, the result is stochastic. 
The last two constraints state that if the if is unaligned or if the condition is 
stochastic, all names in the branches (here, only ts) are unaligned. 

Given constraints for a program, we need to compute a solution satisfying all 
constraints. We do this by repeatedly iterating through all the constraints and 
propagating abstract values accordingly. We terminate when we reach a fixed 
point, i.e., when no constraint results in an update of either S, or unaligned, 
for any x in the program. We extend the 0-CFA constraint propagation al- 
gorithm to also handle the constraints generated for tracking stochastic val- 
ues and unalignment.' Specifically, the algorithm is a function ANALYZEALIGN: 
Tanr > ((X > P(A)) x P(X)) that returns a map associating each variable to 
a set of abstract values and a set of unaligned variables. In other words, ANA- 
LYZEALIGN computes a solution to Sy and unaligned, for each x in the analyzed 
program. For example, ANALYZEALIGN(tezample) results in 


Sn, = {const 1} Sn, = {const 1} Sp = {Ax1.£1} Sy, = {Ax2.t2} 
Sf = {Ax3.t3} Sp = {Arata} Sa, ={stoch} Se, = {stoch} 
Sy, = {Ax2.t2, A£3.t3, Stoch} S,, = {stoch} S, = Ø | other n € X 


unaligned, = true | n € {t2,t3,t4,t5} unaligned, = false | other n € X. 


(5) 


The example confirms our earlier intuition: an intrinsic (~) flows to nı, stoch 
flows to a1, fs is stochastic and originates at either (Av2.t2) or (Av3.t3), and the 
unaligned variables are t2, t3, t4, and t5. We now give soundness results. 


Lemma 1 (0-CFA soundness). For every t € Tanr, the solution produced by 
ANALYZEALIGN(t) satisfies the constraints GENERATECONSTRAINTS(t). 


Proof. The well-known soundness of 0-CFA extends to the new alignment con- 
straints. See, e.g., Nielson et al. [34, Chapter 3] and Shivers [42]. 
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Theorem 1 (Alignment analysis soundness). Assume t € Tanp, At from 
Definition 6, and an assignment to Sy and unaligned, for x E€ X according 
to ANALYZEALIGN(t). Let Ay = {x | aunaligned,} and take arbitrary Ø + 
t ane vi and Set °7 v2. Then, lila, = lola, and consequently At C Ar. 


The proof! uses simultaneous structural induction over the derivations @ + 
t Yt vi and Ø F t #4? v2. At corresponding stochastic branches or stochas- 
tic function applications in the two derivations, a separate structural induction 
argument shows that, for the let-sequences lI, and l4 of the two stochastic sub- 
derivations, [| 4, = 15|4, = []. Combined, the two arguments give the result. 


The result Â C A (cf. Definition 6) shows that the analysis is conservative. 


4.3 Dynamic Alignment 


An alternative to static alignment is dynamic alignment, which we explored 
in early stages when developing the alignment analysis. Dynamic alignment is 
fully context sensitive and amounts to introducing variables in programs that 
track (at runtime) when evaluation enters stochastic branching. To identify these 
stochastic branches, dynamic alignment also requires a runtime data structure 
that keeps track of the stochastic values. Similarly to k-CFA, dynamic alignment 
is potentially more precise than the 0-CFA approach. However, we discovered 
that dynamic alignment introduces significant runtime overhead. Again, we note 
that the models in Section 7 do not require a context-sensitive analysis, justifying 
the choice of 0-CFA over dynamic alignment and k-CFA. 


5 Aligned SMC and MCMC 


This section presents detailed algorithms for aligned SMC (Section 5.1) and 
aligned lightweight MCMC (Section 5.2). For a more pedagogical introduction 
to the algorithms, see Section 2. We assume a basic understanding of SMC and 
Metropolis-Hastings MCMC algorithms (see, e.g., Bishop [4]). 


5.1 Aligned SMC 


We saw in Section 2.1 that SMC operates by executing many instances of t 
concurrently, and resampling them at calls to weight. Critically, resampling 
requires that the inference algorithm can both suspend and resume executions. 
Here, we assume that we can create execution instances e of the probabilistic 
program t, and that we can arbitrarily suspend and resume the instances. The 
technical details of suspension are beyond the scope of this paper. See Goodman 
and Stuhlmüller [14], Wood et al. [48], and Lundén et al. [25] for further details. 

Algorithm 2 presents all steps for the aligned SMC inference algorithm. Af- 
ter running the alignment analysis and setting up the n execution instances, 
the algorithm iteratively executes and resamples the instances. Note that the 
algorithm resamples only at aligned weights (see Section 2.1). 
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Algorithm 2 Aligned SMC. The input is a program t € Tanp and the number 
of execution instances n. 


1. Run the alignment analysis on t, resulting in At (see Theorem 1). 

2. Initiate n execution instances {e; |i EN, 1 <i< n} oft. 

3. Execute all e; and suspend execution upon reaching an aligned weight (i.e., let x = weight 
w int and z € At) or when the execution terminates naturally. The result is a new set of 
execution instances e; with weights w; accumulated from unaligned weights and the single final 
aligned weight during execution. 

4. If alle, = vi (i.e., all executions have terminated and returned a value), terminate inference and 
return the set of weighted samples (v;, w;). The samples approximate the posterior probability 
distribution encoded by t. 

5. Resample the e; according to their weights w). The result is a new set of unweighted execution 
instances e’’. Set e; + el’. Go to 3. 


1 if assume Bernoulli(0.1) then 
1 if assume Bernoulli(0.5) then 2 weight 9; 
2 weight 1; weight 10; true 3 if assume Bernoulli(0.5) 
3 else 4 then weight 1.5 else weight 0.5; 
4 weight 10; weight 1; false 5 true 
6 else (weight 1; false) 
(a) Aligned better than unaligned. 
(b) Unaligned better than aligned. 


Fig. 5: Programs illustrating properties of aligned and unaligned SMC. Fig. (a) 
shows a program better suited for aligned SMC. Fig. (b) shows a program better 
suited for unaligned SMC. 


We conjecture that aligned SMC is preferable over unaligned SMC for all 
practically relevant models, as the evaluation in Section 7 justifies. However, it 
is possible to construct contrived programs in which unaligned SMC has the 
advantage. Consider the programs in Fig. 5, both encoding Bernoulli(0.5) distri- 
butions in a contrived way using weights. Fig. 5a takes one of two branches with 
equal probability. Unaligned SMC resamples at the first weights in each branch, 
while aligned SMC does not because the branch is stochastic. Due to the differ- 
ence in likelihood, many more else executions survive resampling compared to 
then executions. However, due to the final weights in each branch, the branch 
likelihoods even out. That is, resampling at the first weights is detrimental, and 
unaligned SMC performs worse than aligned SMC. Fig. 5b also takes one of two 
branches, but now with unequal probabilities. However, the two branches still 
have equal posterior probability due to the weights. The nested if in the then 
branch does not modify the overall branch likelihood, but adds variance. Aligned 
SMC does not resample for any weight within the branches, as the branch is 
stochastic. Consequently, only 10% of the executions in aligned SMC take the 
then branch, while half of the executions take the then branch in unaligned SMC 
(after resampling at the first weight). Therefore, unaligned SMC better explores 
the then branch and reduces the variance due to the nested if, which results in 
overall better inference accuracy. We are not aware of any real model with the 
property in Fig. 5b. In practice, it seems best to always resample when using 
weight to condition on observed data. Such conditioning is, in practice, always 
done outside of stochastic branches, justifying the benefit of aligned SMC. 
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Algorithm 3 Aligned lightweight MCMC. The input is a program t € TANF, 
the number of steps n, and the global step probability g > 0. 


1. Run the alignment analysis on t, resulting in At (see Theorem 1). 

2. Set i + 0, k + 1, and l + 1. Call Run. 

3. Set i+ i+ 1. If i = n, terminate inference and return the samples {v; | j EN,0< j < n}. 
They approximate the probability distribution encoded by t. 

4. Uniformly draw an index 1 < j < |si—ı| at random. Set global + true with probability g, and 
global + false otherwise. Set w4 + 1, w’ + 1, k + 1, l + 1, and reuse + true. Call Run. 


1 
w w 
5. Compute the Metropolis—Hastings acceptance ratio A = min (. ee =) ù 
wi-1 w’, 
6. With probability A, accept v; and go to 3. Otherwise, set v; + vi-1, Wi + Wi—1, Si + Si—1, 
Pi — Pi—1, S; — Si 1; Pi — pi_,, and ni + ni_,. Go to 3. 


function RUN() = Run t and do the following: 


— Record the total weight w; accumulated from calls to weight. 
— Record the final value v;. eS 
— At unaligned terms let c = assume d in t (c ¢ At), do the following. 
1. If reuse = false, global = true, Nia, kt Æ c, or if Siik. does not exist, sample a 


value x from d and set reuse + false. Otherwise, reuse the sample x = s;_, pı and set 


w} w: Piik. and w’ + w' - fale). 


2. Set S; 4.1 T, Pipi | falz), and ni pi Cc. 
3. Set l &'l + 1. In the program, bind c to the value z and resume execution. 
— At aligned terms let c = assume d in t (c € At), do the following. 

1. If j = k, global = true, or if s;_1,, does not exist, sample a value x from d normally. 
Otherwise, reuse the sample x = s;_1,x. Set w Sd wy ‘Pi—1,k and w + w - falz). 

2. Set sin + x and pin + falz). 

3. Set k + k+1,1+¢ 1, and reuse + true. In the program, bind c to the value x and resume 
execution. 


5.2 Aligned Lightweight MCMC 


Aligned lightweight MCMC is a version of lightweight MCMC [47], where the 
alignment analysis provides information about how to reuse random draws be- 
tween executions. Algorithm 3, a Metropolis—Hastings algorithm in the context 
of PPLs, presents the details. Essentially, the algorithm executes the program re- 
peatedly using the RUN function, and redraws one aligned random draw in each 
step, while reusing all other aligned draws and as many unaligned draws as pos- 
sible (illustrated in Section 2.2). It is possible to formally derive the Metropolis— 
Hastings acceptance ratio in step 5.1 A key property in Algorithm 3 due to 
alignment (Definition 6) is that the length of s; (and p;) is constant, as execut- 
ing t always results in the same number of aligned random draws. 

In addition to redrawing only one aligned random draw, each step has a 
probability g > 0 of being global—meaning that inference redraws every random 
draw in the program. Occasional global steps fix problems related to slow mixing 
and ergodicity of lightweight MCMC identified by Kiselyov [21]. In a global step, 


the Metropolis—Hastings acceptance ratio reduces to A = min (1, aes 


6 Implementation 


We implement the alignment analysis (Section 4), aligned SMC (Section 5.1), 
and aligned lightweight MCMC (Section 5.2) for the functional PPL Miking 
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CorePPL [25], implemented as part of the Miking framework [7]. We implement 
the alignment analysis as a core component in the Miking CorePPL compiler, 
and then use the analysis when compiling to two Miking CorePPL backends: 
RootPPL and Miking Core. RootPPL is a low-level PPL with built-in highly 
efficient SMC inference [25], and we extend the CorePPL to RootPPL compiler 
introduced by Lundén et al. [25] to support aligned SMC inference. Furthermore, 
we implement aligned lightweight MCMC inference standalone as a translation 
from Miking CorePPL to Miking Core. Miking Core is the general-purpose pro- 
gramming language of the Miking framework, currently compiling to OCaml. 

The idealized calculus in (1) does not capture all features of Miking CorePPL. 
In particular, the alignment analysis implementation must support records, vari- 
ants, sequences, and pattern matching over these. Extending 0-CFA to such lan- 
guage features is not new, but it does introduce a critical challenge for the align- 
ment analysis: identifying all possible stochastic branches. Determining stochas- 
tic ifs is straightforward, as we simply check if stoch flows to the condition. 
However, complications arise when we add a match construct (and, in general, 
any type of branching construct). Consider the extension 


ts= ... | match t with p then t else t | {k1 = £1, ..., kn = Tn} 
p:= x | true | false | {kı =p, ..., kn =p} (6) 
L,01,---,tn EX k,...,knEK nEN 


of (1), adding records and simple pattern matching. K is a set of record keys. As- 
sume we also extend the abstract values as a =... | {ky = Xq,...,kn = Xn}, 
where X1,..., Xn C X. That is, we add an abstract record tracking the names 
in the program that flow to its entries. Consider the program match tı with { 
a = zı, b= false } then tg else t3. This match is, similar to ifs, stochastic 
if stoch € S;,. It is also, however, stochastic in other cases. Assume we have 
two program variables, x and y, such that stoch € Sy and stoch ¢ Sy. Now, 
the match is stochastic if, e.g., {a = {y}, b= {a}} © St, because the random 
value flowing from x to the pattern false may not match because of randomness. 
However, it is not stochastic if, instead, S+, = {{a = {x}, b= {y}}}. The ran- 
domness of x does not influence whether or not the branch is stochastic—the 
variable pattern xı for label a always matches. 

Our alignment analysis implementation handles the intricacies of identify- 
ing stochastic match cases for nested record, variant, and sequence patterns. In 
total, the alignment analysis, aligned SMC, and aligned lightweight MCMC im- 
plementations consist of approximately 1000 lines of code directly contributed 
as part of this paper. The code is available on GitHub [2]. 


7 Evaluation 


This section evaluates aligned SMC and aligned lightweight MCMC on a set 
of models encoded in Miking CorePPL: CRBD [33,39] in Sections 7.1 and 7.5, 
ClaDS [28,39] in Section 7.2, state-space aircraft localization in Section 7.3, 
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and latent Dirichlet allocation in Section 7.4. CRBD and ClaDS are non-trivial 
models of considerable interest in evolutionary biology and phylogenetics [39]. 
Similarly, LDA is a non-trivial topic model [5]. Running the alignment analysis 
took approximately 5 ms-30 ms for all models considered in the experiment, 
justifying that the time complexity is not a problem in practice. 

We compare aligned SMC with standard unaligned SMC [14], which is identi- 
cal to Algorithm 2, except that it resamples at every call to weight. We carefully 
checked that automatic alignment corresponds to previous manual alignments 
of each model. For all SMC experiments, we estimate the normalizing constant 
produced as a by-product of SMC inference rather than the complete posterior 
distributions. The normalizing constant, also known as marginal likelihood or 
model evidence, frequently appears in Bayesian inference and gives the proba- 
bility of the observed data averaged over the prior. The normalizing constant 
is useful for model comparison as it measures how well different probabilistic 
models fit the data (a larger normalizing constant indicates a better fit). 

We ran aligned and unaligned SMC with Miking CorePPL and the RootPPL 
backend configured for a single-core (compiled with GCC 7.5.0). Lundén et 
al. [25] shows that the RootPPL backend is significantly more efficient than other 
state-of-the-art PPL SMC implementations. We ran aligned and unaligned SMC 
inference 300 times (and with 3 warmup runs) for each experiment for 10*, 10°, 
and 10° executions (also known as particles in SMC literature). 

We compare aligned lightweight MCMC to lightweight MCMC.' We imple- 
ment both versions as compilers from Miking CorePPL to Miking Core, which 
in turn compiles to OCaml (version 4.12). The lightweight MCMC databases 
are functional-style maps from the OCaml Map library. We set the global step 
probability to 0.1 for both aligned lightweight MCMC and lightweight MCMC. 
We ran aligned lightweight and lightweight MCMC inference 300 times for each 
experiment. We burned 10% of samples in all MCMC runs. 

For all experiments, we used an Intel Xeon 656 Gold 6136 CPU (12 cores) 
and 64 GB of memory running Ubuntu 18.04.5. 


7.1 SMC: Constant Rate Birth-Death (CRBD) 


This experiment considers the CRBD diversification model from [39] applied to 
the Alcedinidae phylogeny (Kingfisher birds, 54 extant species) [19]. We use fixed 
diversification rates to simplify the model, as unaligned SMC inference accuracy 
is too poor for the full model with priors over diversification rates. Aligned SMC 
is accurate for both the full and simplified models. The source code consists of 
130 lines of code.’ The total experiment execution time was 16 hours. 

Fig. 6 presents the experiment results. Aligned SMC is roughly twice as fast 
and produces superior estimates of the normalizing constant. Unaligned SMC 
has not yet converged to the correct value —304.75 (available for this particular 
model due to the fixing the diversification rates) for 10° particles, while aligned 
SMC produces precise estimates already at 10* particles. Excess resampling is a 
significant factor in the increase in execution time for unaligned SMC, as each 
execution encounters far more resampling checkpoints than in aligned SMC. 
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Fig. 6: SMC experiment results for CRBD. The x-axes give the number of parti- 
cles. Fig. (a) shows execution times (in seconds) for aligned (gray) and unaligned 
(white) SMC. Error bars show one standard deviation. Fig. (b) shows box plot log 
normalizing constant estimates for aligned (gray) and unaligned (white) SMC. 
The analytically computed log normalizing constant is —304.75. 
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Fig. 7: SMC experiment results for ClaDS. The x-axes give the number of parti- 
cles. Fig. (a) shows execution times (in seconds) for aligned (gray) and unaligned 
(white) SMC. Error bars show one standard deviation. Fig. (b) shows box plot log 
normalizing constant estimates for aligned (gray) and unaligned (white) SMC. 
The average estimate for aligned SMC with 10° particles is —314.35. 


7.2 SMC: Cladogenetic Diversification Rate Shift (ClaDS) 


A limitation of CRBD is that the diversification rates are constant. ClaDS [28,39] 
is a set of diversification models that allow shifting rates over phylogenies. We 
evaluate the ClaDS2 model for the Alcedinidae phylogeny. As in CRBD, we use 
fixed (initial) diversification rates to simplify the model on account of unaligned 
SMC. The source code consists of 147 lines of code.’ Automatic alignment sim- 
plifies the ClaDS2 model significantly, as manual alignment requires collecting 
and passing weights around in unaligned parts of the program, which are later 
consumed by aligned weights. The total experiment execution time was 67 hours. 

Fig. 7 presents the experiment results. 12 unaligned runs for 10° particles 
and nine runs for 10° particles ran out of the preallocated stack memory for 
each particle (10 kB). We omit these runs from Fig. 7. The consequence of not 
aligning SMC is more severe than for CRBD. Aligned SMC is now almost seven 
times faster than unaligned SMC and the unaligned SMC normalizing constant 
estimates are significantly worse compared to the aligned SMC estimates. The 
unaligned SMC estimates do not even improve when moving from 104 to 10° 
particles (we need even more particles to see improvements). Again, aligned 
SMC produces precise estimates already at 104 particles. 
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Fig. 8: SMC experiment results for the state-space aircraft localization model. 
The x-axes give the number of particles. Fig. (a) shows execution times (in 
seconds) for aligned (gray) and unaligned (white) SMC. Error bars show one 
standard deviation. Fig. (b) shows box plot log normalizing constant estimates on 
the y-axis for aligned (gray) and unaligned (white) SMC. The average estimate 
for aligned SMC with 10° particles is —61.26. 


7.3 SMC: State-Space Aircraft Localization 


This experiment considers an artificial but non-trivial state-space model for air- 
craft localization. The source code consists of 62 lines of code.’ The total exper- 
iment execution time was 1 hour. 

Fig. 8 presents the experiment results. The execution time difference is not as 
significant as for CRBD and ClaDS. However, the unaligned SMC normalizing 
constant estimates are again much less precise. Aligned SMC is accurate (cen- 
tered at approximately —61.26) already at 10* particles. The model’s straightfor- 
ward control flow explains the less dramatic difference in execution time—there 
are at most ten unaligned likelihood updates in the aircraft model, while the 
number is, in theory, unbounded for CRBD and ClaDS. Therefore, the cost of 
extra resampling compared to aligned SMC is not as significant. 


7.4 MCMC: Latent Dirichlet Allocation (LDA) 


This experiment considers latent Dirichlet allocation (LDA), a topic model used 
in the evaluations by Wingate et al. [47] and Ritchie et al. [38]. We use a synthetic 
data set, comparable in size to the data set used by Ritchie et al. [38], with a 
vocabulary of 100 words, 10 topics, and 25 documents each containing 30 words. 
Note that we are not using methods based on collapsed Gibbs sampling [17], and 
the inference task is therefore computationally challenging even with a rather 
small number of words and documents. The source code consists of 31 lines of 
code.' The total experiment execution time was 41 hours. 

The LDA model consists of only aligned random draws. As a consequence, 
aligned lightweight and lightweight MCMC reduces to the same inference algo- 
rithm, and we can compare the algorithms by just considering the execution 
times. The experiment also justifies the correctness of both algorithms. t 

Fig. 9 presents the experiment results. Aligned lightweight MCMC is al- 
most three times faster than lightweight MCMC. To justify the execution times 
with our implementations, we also implemented and ran the experiment with 
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Fig. 9: MCMC experiment results for LDA showing execution time (in seconds) 
for aligned lightweight MCMC (gray) and lightweight MCMC (white). Error bars 
show one standard deviation and the x-axis the number of MCMC iterations. 


lightweight MCMC in WebPPL [14] for 10° iterations, repeated 50 times (and 
with 3 warmup runs). The mean execution time was 383 s with standard devia- 
tion 5 s. We used WebPPL version 0.9.15 and Node version 16.18.0. 


7.5 MCMC: Constant Rate Birth-Death (CRBD) 


This experiment again considers CRBD. MCMC is not as suitable for CRBD as 
SMC, and therefore we use a simple synthetic phylogeny with six leaves and an 
age span of 5 age units (Alcedinidae used for the SMC experiment has 54 leaves 
and an age span of 35 age units). The source code for the complete model is the 
same as in Section 7.1, but we now allow the use of proper prior distributions 
for the diversification rates. The total experiment execution time was 7 hours. 


Unlike LDA, the CRBD model contains both unaligned and aligned random 
draws. Because of this, aligned lightweight MCMC and standard lightweight 
MCMC do not reduce to the same algorithm. To judge the difference in infer- 
ence accuracy, we consider the mean estimates of the birth diversification rate 
produced by the two algorithms, in addition to execution times. The experiment 
results shows that the posterior distribution over the birth rate is unimodal, 
which motivates using the posterior mean as a measure of accuracy. 


Fig. 10 presents the experiment results. Aligned lightweight MCMC is ap- 
proximately 3.5 times faster than lightweight MCMC. There is no obvious dif- 
ference in accuracy. To justify the execution times and correctness of our im- 
plementations, we also implemented and ran the experiment with lightweight 
MCMC in WebPPL [|14] for 3- 10° iterations, repeated 50 times (and with 3 
warmup runs). The mean estimates agreed with Fig. 10. The mean execution 
time was 37.1 s with standard deviation 0.8 s. The speedup compared to stan- 
dard lightweight MCMC in Miking CorePPL is likely explained by the use of 
early termination in WebPPL, which benefits CRBD. Early termination easily 
combines with alignment but relies on execution suspension, which we do not 
currently use in our implementations. Note that aligned lightweight MCMC is 
faster than WebPPL even without early termination. 


In conclusion, the experiments clearly demonstrate the need for alignment. 


558 D. Lundén et al. 


63.95 0.45 
= 0.4 fi g 
18.54 0.33 ES 9. 2. 


63 1.82 6: 
3- 10 3- 10 3- 10° 3-10* 3-10° 3-10° 
(a) Execution times. (b) Birth rate mean estimates. 


Fig. 10: MCMC experiment results for CRBD. The x-axes give the number of 
iterations. Fig. (a) shows execution times (in seconds) for aligned lightweight 
MCMC (gray) and lightweight MCMC (white). Error bars show one standard 
deviation. Fig. (b) shows box plot posterior mean estimates of the birth rate for 
aligned lightweight MCMC (gray) and lightweight MCMC (white). The average 
estimate for aligned lightweight MCMC with 3 - 10° iterations is 0.33. 


8 Related Work 


The approach by Wingate et al. [47] is closely related to ours. A key similarity 
with alignment is that executions reaching the same aligned checkpoint also 
have matching stack traces according to Wingate et al.’s addressing transform. 
However, Wingate et al. do not consider the separation between unaligned and 
aligned parts of the program, their approach is not static, and they do not 
generalize to other inference algorithms such as SMC. 

Ronquist et al. [39], Turing [12], Anglican [48], Paige and Wood [36], and van 
de Meent et al. [46] consider the alignment problem. Manual alignment is critical 
for the models in Ronquist et al. [39] to make SMC inference tractable, which 
strongly motivates the automatic alignment approach. The documentation of 
Turing states that: “The observe statements |i.e., likelihood updates] should be 
arranged so that every possible run traverses all of them in exactly the same 
order. This is equivalent to demanding that they are not placed inside stochastic 
control flow” [1]. Turing does not include any automatic checks for this property. 
Anglican [48] checks, at runtime (resulting in overhead), that all SMC executions 
encounter the same number of likelihood updates, and thus resamples the same 
number of times. If not, Anglican reports an error: “some observe directives |i.e., 
likelihood updates] are not global”. This error refers to the alignment problem, 
but the documentation does not explain it further. Probabilistic C, introduced by 
Paige and Wood [36], similarly assumes that the number of likelihood updates 
is the same in all executions. Van de Meent et al. [46] state, in reference to 
SMC: “Each breakpoint |i.e., checkpoint] needs to occur at an expression that 
is evaluated in every execution of a program”. Again, they do not provide any 
formal definition of alignment nor an automatic solution to enforce it. 

Lundén et al. [24] briefly mention the general problem of selecting optimal 
resampling locations in PPLs for SMC but do not consider the alignment problem 
in particular. They also acknowledge the overhead resulting from not all SMC 
executions resampling the same number of times, which alignment avoids. 
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The PPLs Birch [31], Pyro [3], and WebPPL [14] support SMC inference. 
Birch and Pyro enforce alignment for SMC as part of model construction. Note 
that this is only true for SMC in Pyro—other Pyro inference algorithms use 
other modeling approaches. The approaches in Birch and Pyro are sound but 
demand more of their users compared to the alignment approach. WebPPL does 
not consider alignment and resamples at all likelihood updates for SMC. 

Ritchie et al. [38] and Nori et al. [35] present MCMC algorithms for proba- 
bilistic programs. Ritchie et al. [38] optimize lightweight MCMC by Wingate et 
al. [47] through execution suspensions and callsite caching. The optimizations are 
independent of and potentially combines well with aligned lightweight MCMC. 
Another MCMC optimization which potentially combines well with alignment 
is due to Nori et al. [35]. They use static analysis to propagate observations 
backwards in programs to improve inference. 

Information flow analyses [40] may determine if particular parts of a program 
execute as a result of different program inputs. Specifically, if program input is 
random, such approaches have clear similarities to the alignment analysis. 

Many other PPLs exist, such as Gen [10], Venture [29], Edward [44], Stan [8], 
and AugurV2 [18]. Gen, Venture, and Edward focus on simplifying the joint 
specification of a model and its inference to give users low-level control, and do 
not consider automatic alignment specifically. However, the incremental inference 
approach [9] in Gen does use the addressing approach by Wingate et al. [47]. Stan 
and AugurV2 have less expressive modeling languages to allow more powerful 
inference. Alignment is by construction due to the reduced expressiveness. 

Borgström et al. [6], Staton et al. [43], Scibior et al. [41], and Vakar et al. [45] 
treat semantics and correctness for PPLs, but do not consider alignment. 


9 Conclusion 


This paper gives, for the first time, a formal definition of alignment in PPLs. 
Furthermore, we introduce a static analysis technique and use it to align check- 
points in PPLs and apply it to SMC and MCMC inference. We formalize the 
alignment analysis, prove its correctness, and implement it in Miking CorePPL. 
We also implement aligned SMC and aligned lightweight MCMC, and evaluate 
the implementations on non-trivial CRBD and ClaDS models from phylogenet- 
ics, the LDA topic model, and a state-space model, demonstrating significant 
improvements compared to standard SMC and lightweight MCMC. 
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